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README 

MMIX is a computer intended to illustrate machine-level aspects of programming. 
In my books The Art of Computer Programming, it replaces MIX, the 1960s-style 
machine that formerly played such a role. MMIX’s so-called RISC (“Reduced Instruc- 
tion Set Computer”) architecture is much better able to represent the computers 
being built at the turn of the millennium. 

I strove to design MMIX so that its machine language would be simple, elegant, and 
easy to learn. At the same time I was careful to include all of the complexities needed 
to achieve high performance in practice, so that MMIX could in principle be built and 
even perhaps be competitive with some of the fastest general-purpose computers in 
the marketplace. I hope that MMIX will therefore prove to be a useful vehicle for 
people who are studying how to improve compilers and operating systems, and that 
other authors will like MMIX well enough to make use of it in their own textbooks. 
My goal in this work is to provide a clean, complete, and well-documented “machine- 
independent machine” that people all over the world will be able to use as a testbed 
for long-term research projects of lasting value, even as real computers continue to 
change rapidly. 

This book is a collection of programs that make MMIX a virtual reality. One of the 
programs is an assembler, MMIXAL, which converts MMIX symbolic files to MMIX object 
files. There also are two simulators, which execute the programs in given object files. 
The first simulator, called MMIX-SIM or simply MMIX, executes a program one in- 
struction at a time and allows convenient debugging. The second simulator, MMMIX, 
simulates a high-performance pipeline in which many aspects of the computation are 
overlapped in time. MMMIX is in fact a highly configurable “meta-simulator,” capa- 
ble of simulating an enormous variety of different kinds of pipelines with any number 
of functional units and with many possible strategies for caching, virtual address 
translation, branch prediction, super-scalar instruction issue, etc., etc. 

The programs in this book are somewhat primitive, because they all are based on 
a simple terminal interface: Users type commands and the computer types out a 
reply. Still, these programs are adequate to provide a basis for future developments. 
I’m hoping that at least one reader of this book will discover how much fun MMIX 
programming can be and will be motivated to create a nice graphical interface, so 
that other people will more easily be able to join in the fun. I don’t have the time or 
talent to construct a good GUI myself, but I’ve tried to write the programs in such a 
way that modifications and enhancements will be easy to make. 

The latest versions of all these programs can be downloaded from MMIX’s home page 

http : //mmix . cs .hm. edu/ 

in a file named mmix-YYYYMMDD.tar.gz. The programs are copyrighted, but anyone 
can use them without charge. Furthermore I explicitly allow anybody to copy and 
modify the programs in any way they like, provided only that the computer files 
are given different names whenever they have been changed. Only my designated 
successors in Munich are allowed to make a correction or addition to the copyrighted 
file mmixal.w, for example, unless the corrected file is identified by some other name 
(possibly ‘turbo-mmixal .w’ or ‘mmixal++ .w’, etc.). 
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The programs are all written in CWEB , a language that combines C with TJ^X in such 
a way that standard preprocessors can easily convert mmixal . w into a compilable file 
mmixal . c or a documentation file mmixal.tex. CWEB also includes a “change file” 
mechanism by which people can easily customize a master source file like mmixal . w 
without changing the master file in any way. (See 

http : //www-cs-f acuity . Stanford . edu/~knuth/ cweb . html 

for complete information about CWEB, including installation instructions for the related 
software.) Readers of the present book who are unfamiliar with CWEB might want to 
refer to the notes on “How to read CWEB programs” that appear on pages 70-73 of my 
book The Stanford GraphBase (New York: ACM Press, 1993), but the general ideas 
are almost self-explanatory so I decided not to reprint those notes here. 

During the next several years, as I write Volume 4 of The Art of Computer Pro- 
gramming, I plan to prepare updates to Volumes 1-3 whenever Volume 4 needs to 
refer to new material that belongs more properly in earlier volumes. These updates, 
called “fascicles,” will be available on the Internet via 

http : //www-cs-f acuity . Stamford. edu/~knuth/taocp .html 

and they will also be published in hardcopy form. The first such fascicle is already 
finished and available for downloading; it is a tutorial introduction to MMIX and the 
MMIX assembly language. Everybody who is seriously interested in MMIX should read 
that First Fascicle, preferably before reading the programs in the present book. 

I’ve tried to make the MMlXware programs interesting to read as well as useful. 
Indeed, the MMIX-PIPE program, which is the chief component of the MMMIX meta- 
simulator, is one of the most instructive programs I’ve ever had the pleasure of writing. 
But I don’t expect a great number of people to study every part of this book closely, or 
even to study every part of MMIX-PIPE. The main purpose of this book is to provide 
a complete documentation of the MMIX computer and its assembly language. Many 
details about MMIX were too “picky” or too system-oriented to be appropriate for the 
First Fascicle, but every detail about MMIX can be found in the present book. 

After the MMlXware programs have been installed on a UNIX-like system, they are 
typically used as follows. First a user program is written in assembly language and 
put into a file, say foo.mms. (The suffix .mms stands for “MMIX symbolic.”) Then the 
command 

mmixal foo.mms 

will translate it into an object file, foo.mmo. Alternatively, a command such as 
mmixal -I foo.Ist foo.mms 

could be used; this would produce a listing file, foo.Ist, in addition to foo.mmo. 
The listing file, when printed, would show the contents of foo.mms together with the 
assembled machine language instructions. 
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Once an object file like foo.mmo exists, it can be run on the simple simulator by 
issuing a command such as 

mmix foo 

(or mmix foo.mmo). Many options are also possible; for example, 
mmix -s foo 

will print running time statistics when the program ends; 
mmix -P foo 

will print a profile that shows exactly how often each instruction was executed; 
mmix -V foo 

will give “verbose” details about everything the simulator did; 
mmix -t2 foo 

will trace each instruction the first two times it is performed; and so on. Also 
mmix -i foo 

will run the simulator in interactive mode, obeying various online commands by which 
the user can watch exactly what is happening when key parts of the program are 
reached. The command 

mmix foo bar 

will run the simulator as if MMIX itself were running the command ‘foo bar’ with a 
rudimentary operating system; any number of command-line arguments can follow 
the name of the program being simulated. 

The MMMIX meta-simulator can also be applied to the same program, although a 
bit more preparation is necessary. First the command 

mmix -Dfoo.mmb foo bar 

will dump out a binary file foo.mmb containing the information needed to load ‘foo 
bar’ into MMIX’s memory. Then a command like 

mmmix plain. mmconfig foo.mmb 

will invoke the meta-simulator with a “plain” pipeline configuration. The meta- 
simulator always runs interactively, using the prompt ‘mmmix>’ when it wants in- 
structions about what to do next. Users can type ‘?’ in response to this prompt if 
they want to be reminded about what the simulator can do. Typical responses are 
‘vff ’ (run verbosely); ‘vO’ (run quietly); ‘p’ (show the pipeline); ‘g255’ (show global 
register 255); ‘D’ (show the D-cache); ‘b200’ (pause when location *200 is fetched); 
‘1000’ (run 1000 cycles); etc. Some familiarity with MMIX-PIPE is necessary to un- 
derstand the meta-simulator’s reports of its activity, but users of mmmix are assumed 
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to be able to extract high-level information from a mass of low-level details. (This 
talent, after all, is the hallmark of a computer scientist.) 

The programs in this book appear in alphabetical order: 

MMIX explains everything about the MMIX architecture. 

MMIX-ARITH contains subroutines for 64-bit fixed and floating point arithmetic, 
using only 32-bit fixed point arithmetic. 

MMIX-CONFIG processes configuration files for MMMIX. 

MMIX-IO contains subroutines for the primitive input/output operations of a rudi- 
mentary operating system. 

MMIX-MEM handles memory references of MMMIX in special cases associated with 
memory-mapped input/output. 

MMIX-PIPE does the hard work of pipeline simulation. 

MMIX-SIM is the program for the non-pipelined simulator. 

MMIXAL is the assembly program. 

MMMIX is the driver program for the meta-simulator. 

MMOTYPE is a utility program that translates an MMIX object file into human- 
readable form. 

The first of these, MMIX, is not actually a program, although it has been formatted 
as a CWEB document; it is a complete definition of MMIX, including the details of 
features that are used only by the operating system. It should be read first, but the 
other programs can be read in any order. (Actually MMIXAL or MMIX-SIM should 
probably be read next after MMIX, and MMIX-PIPE last. The program MMIX-SIM is 
the line-at-a-time simulator that is known simply as mmix after it has been compiled.) 

Mini-indexes have been provided on each right-hand page of this book so that the 
programs can be read essentially as hypertext. Every identifier that is used on a two- 
page spread but defined on some other page is listed in the mini-index. For example, a 
mini-index entry such as ‘oplus: octa ( ), MMIX-ARITH §5’ means that the identifier 
oplus denotes a function defined in section §5 of the MMIX-ARITH module, returning 
a value of type octa. A master index to all uses of all identifiers appears at the end 
of this book. 

Happy hacking! 

Donald E. Kimth 
Cambridge, Massachusetts 
17 October 1999 



CONTENTS 



V README (a preface) 

1 WELCOME (an explanation) 

2 MMIX (a definition) 

62 MMIX-ARITH (a library) 

no MMIX-CONEIG (a part of MMMIX) 

138 MMIX-IO (a library) 

148 MMIX-MEM (a triviality) 

150 MMIX-PIPE (a part of MMMIX) 

332 MMIX-SIM (a simulator) 

422 MMIXAL (an assembler) 

494 MMMIX (a meta-simulator) 

510 MMOTYPE (a utility program) 

524 Master Index (a table of references) 




Welcome to the revised printing of MMIXware, which incorporates hundreds of de- 
tailed changes suggested by readers of the original 1999 printing. I thank the staff at 
Springer for providing this opportunity to make an archival, corrected version of the 
entire text. 

The current printing documents Version 1 of MMIX, and it corresponds to the 
programs of mmix-20131017 . tgz. Version 1 is permanently frozen, and “bug-free by 
definition.” All future developments can be accessed via the MMIX home page cited 
above in the README section (the “frontmatter”). 

Each of the following ten chapters begins on a left-hand page, and represents a 
component of the official CWEB source files for MMIX Version 1. 

DEK, 17 October 2013 
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MMIX 

1. Introduction to MMIX. Thirty-eight years have passed since the MIX com- 
puter was designed, and computer architecture has been converging during those years 
towards a rather different style of machine. Therefore it is time to replace MIX with 
a new computer that contains even less saturated fat than its predecessor. 

Exercise 1.3.1-25 in the third edition of Fundamental Algorithms speaks of an 
extended MIX called MixMaster, which is upward compatible with the old version. 
But MixMaster itself is hopelessly obsolete; although it allows for several gigabytes of 
memory, we can’t even use it with ASCII code to get lowercase letters. And ouch, the 
standard subroutine calling convention of MIX is irrevocably based on self-modifying 
code! Decimal arithmetic and self-modifying code were popular in 1962, but they sure 
have disappeared quickly as machines have gotten bigger and faster. A completely 
new design is called for, based on the principles of RISC architecture as expounded 
in Computer Architecture by Hennessy and Patterson (Morgan Kaufmann, 1996). 

So here is MMIX, a computer that will totally replace MIX in the “ultimate” editions 
of The Art of Computer Programming, Volumes 1-3, and in the first editions of the 
remaining volumes. I must confess that I can hardly wait to own a computer like this. 

How do you pronounce MMIX? I’ve been saying “em-mix” to myself, because the 
first ‘M’ represents a new millennium. Therefore I use the article “an” instead of “a” 
before the name MMIX in English phrases like “an MMIX simulator.” 

Incidentally, the Dictionary of American Regional English 3 (1996) lists “mommix” 
as a common dialect word used both as a noun and a verb; to mommix something 
means to botch it, to bollix it. Only time will tell whether I have mommixed the 
definition of MMIX. 

2. The original MIX computer could be operated without an operating system; you 
could bootstrap it with punched cards or paper tape and do everything yourself. But 
nowadays such power is no longer in the hands of ordinary users. The MMIX hardware, 
like all other computing machines made today, relies on an operating system to get 
jobs started in their own address spaces and to provide I/O capabilities. 

Whenever anybody has asked if I will be writing about operating systems, my reply 
has always been “Nix.” Therefore the name of MMIX’s operating system, NNIX, will 
come as no surprise. From time to time I will necessarily have to refer to things that 
NNIX does for its users, but I am unable to build NNIX myself. Life is too short. 
It would be wonderful if some expert in operating system design became inspired to 
write a book that explains exactly how to construct a nice, clean NNIX kernel for an 
MMIX chip. 

3. I am deeply grateful to the many people who have helped me shape the behav- 
ior of MMIX. In particular, John Hennessy and (especially) Dick Sites have made 
significant contributions. 
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MMIX: INTRODUCTION TO MMIX 



4. A programmer’s introduction to MMIX appears in “Volume 1, Fascicle 1,” a 
booklet containing tutorial material that will ultimately appear in the fourth edition of 
The Art of Computer Programming. The description in the following sections is rather 
different, because we are concerned about a complete implementation, including all 
of the features used by the operating system and invisible to normal programs. Here 
it is important to emphasize exceptional cases that were glossed over in the tutorial, 
and to consider nitpicky details about things that might go wrong. 
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5. MMIX basics. MMIX is a 64-bit RISC machine with at least 256 general- 
purpose registers and a 64-bit address space. Every instruction is four bytes long and 
has the form 



OP 


X 


Y 


Z 



The 256 possible OP codes fall into a dozen or so easily remembered categories; an 
instruction usually means, “Set register X to the result of Y OP Z.” For example, 



32 


1 


2 


3 



sets register 1 to the sum of registers 2 and 3. A few instructions combine the Y and 
Z bytes into a 16-bit YZ field; two of the jump instructions use a 24-bit XYZ field. 
But the three bytes X, Y, Z usually have three-pronged significance independent of 
each other. 

Instructions are usually represented in a symbolic form corresponding to the MMIX 
assembly language, in which each operation code has a mnemonic name. For example, 
operation 32 is ADD, and the instruction above might be written ‘ADD $1,$2,$3’; a 
dollar sign *$’ symbolizes a register number. In general, the instruction ADD $X,$Y,$Z 
is the operation of setting $X = $Y -|- $Z. An assembly language instruction with two 
commas has three operand fields X, Y, Z; an instruction with one comma has two 
operand fields X, YZ; an instruction with no comma has one operand field, XYZ; an 
instruction with no operands has X = Y = Z = 0. 

Most instructions have two forms, one in which the Z field stands for register $Z, and 
one in which Z is an unsigned “immediate” constant. Thus, for example, the command 
‘ADD $X,$Y,$Z’ has a counterpart ‘ADD $X,$Y,Z’, which sets $X = $Y + Z. Immediate 
constants are always nonnegative. In the descriptions below we will introduce such 
pairs of instructions by writing just ‘ADD $X,$Y,$Z|Z’ instead of naming both cases 
explicitly. 

The operation code for ADD $X,$Y,$Z is 32, but the operation code for ADD $X,$Y,Z 
is 33. The MMIX assembler chooses the correct code by noting whether the third 
argument is a register number or not. 

Register numbers and constants can be given symbolic names; for example, the as- 
sembly language instruction ‘x IS $1’ makes x an abbreviation for register number I. 
Similarly, ‘FIVE IS 5’ makes FIVE an abbreviation for the constant 5. After these ab- 
breviations have been specified, the instruction ADD x,x,FIVE increases $I by 5, using 
opcode 33, while the instruction ADD x,x,x doubles $I using opcode 32. Symbolic 
names that stand for register numbers conventionally begin with a lowercase letter, 
while names that stand for constants conventionally begin with an uppercase letter. 
This convention is not actually enforced by the assembler, but it tends to reduce a 
programmer’s confusion. 

6. A nybble is a 4-bit quantity, often used to denote a decimal or hexadecimal digit. 
A byte is an 8-bit quantity, often used to denote an alphanumeric character in ASCII 
code. The Unicode standard extends ASCII to essentially all the world’s languages by 
using I6-bit-wide characters called wydes. (Weight watchers know that two nybbles 
make one byte, but two bytes make one wyde.) In the discussion below we use the 
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term tetrabyte or “tetra” for a 4-byte quantity, and the similar term octabyte or “octa” 
for an 8-byte quantity. Thus, a tetra is two wydes, an octa is two tetras; an octabyte 
has 64 bits. Each MMIX register can be thought of as containing one octabyte, or two 
tetras, or four wydes, or eight bytes, or sixteen nybbles. 

When bytes, wydes, tetras, and octas represent numbers they are said to be either 
signed or unsigned. An unsigned byte is a number between 0 and 2®— 1 = 255 inclusive; 
an unsigned wyde lies, similarly, between 0 and 2^® — 1 = 65535; an unsigned tetra 
lies between 0 and 2®^ — 1 = 4,294,967,295; an unsigned octa lies between 0 and 
2®^ — 1 = 18,446,744,073,709,551,615. Their signed counterparts use the conventions 
of two’s complement notation, by subtracting respectively 2®, 2^®, 2®^, or 2®“^ times 
the most signihcant bit. Thus, the unsigned bytes 128 through 255 are regarded as 
the numbers —128 through —1 when they are evaluated as signed bytes; a signed byte 
therefore lies between —128 and -1-127, inclusive. A signed wyde is a number between 
—32768 and -1-32767; a signed tetra lies between —2,147,483,648 and -1-2,147,483,647; a 
signed octa lies between —9,223,372,036,854,775,808 and -1-9,223,372,036,854,775,807. 

The virtual memory of MMIX is an array M of 2®^ bytes. If k is any unsigned 
octabyte, M[fc] is a 1-byte quantity. MMIX machines do not actually have such vast 
memories, but programmers can act as if 2®^ bytes are indeed present, because MMIX 
provides address translation mechanisms by which an operating system can maintain 
this illusion. 

We use the notation M 2 * [fc] to stand for a number consisting of 2* consecutive bytes 
starting at location k A (2®^ — 2*). (The notation k A (2®^ — 2*) means that the least 
significant t bits of k are set to 0, and only the least 64 bits of the resulting address 
are retained. Similarly, the notation /cV (2* — 1) means that the least significant t bits 
of k are set to 1.) All accesses to 2‘-byte quantities by MMIX are aligned, in the sense 
that the hrst byte is a multiple of 2*. 

Addressing is always “big-endian.” In other words, the most signihcant (leftmost) 
byte of M 2 t[fc] is Mi[fc A (2®"^ — 2*)] and the least signihcant (rightmost) byte is 
Mi[fc V (2* — 1)]. We use the notation s(M 2 t[/c]) when we want to regard this 2*- 
byte number as a signed integer. Formally speaking, if Z = 2‘, 



s{Mi[k]) = (Mi[fcA(-0]Mi[fcA(-0 + l] ... Mi[fcV(/-l)])25g-2®'[Mi[fcA(-0]>128]. 
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7. Loading and storing. Several instructions can be used to get information 
from memory into registers. For example, the “load tetra unsigned” instruction LDTU 
$1,$4,$5 puts the four bytes M4[$4 + $5] into register 1 as an unsigned integer; the 
most significant four bytes of register 1 are set to zero. The similar instruction LOT 
$1,$4,$5, “load tetra,” sets $1 to the signed integer s(M4[$4 + $5]). (Instructions 
generally treat numbers as signed unless the operation code specifically calls them 
unsigned.) In the signed case, the most significant four bytes of the register will be 
copies of the most significant bit of the tetrabyte loaded; thus they will be all Os or 
all Is, depending on whether the number is > 0 or < 0. 

• LDB $X,$Y,$Z|Z ‘load byte’. 

Byte s(M[$Y + $Z]) or s(M[$Y + Z]) is loaded into register X as a signed number 
between —128 and +127, inclusive. 

• LDBU $X,$Y,$Z|Z ‘load byte unsigned’. Byte M[$Y + $Z] or M[$Y + Z] is loaded 
into register X as an unsigned number between 0 and 255, inclusive. 

• LDW $X,$Y,$Z|Z ‘load wyde’. 

Bytes s(M 2 [$Y + $Z]) or s(M 2 [$Y + Z]) are loaded into register X as a signed number 
between —32768 and +32767, inclusive. As mentioned above, our notation M 2 [A:] 
implies that the least significant bit of the address $Y + $Z or $Y + Z is ignored and 
assumed to be 0. 

• LDWU $X,$Y,$Z|Z ‘load wyde unsigned’. Bytes M 2 [$Y + $Z] or M 2 [$Y + Z] are 
loaded into register X as an unsigned number between 0 and 65535, inclusive. 

• LDT $X,$Y,$Z|Z ‘load tetra’. 

Bytes s(M 4 [$Y + $Z]) or s(M 4 [$Y + Z]) are loaded into register X as a signed number 
between —2,147,483,648 and +2,147,483,647, inclusive. As mentioned above, our 
notation M 4 [fc] implies that the two least significant bits of the address $Y + $Z or 
$Y + Z are ignored and assumed to be 0. 

• LDTU $X,$Y,$Z|Z ‘load tetra unsigned’. 

Bytes M 4 [$Y + $Z] or M 4 [$Y + Z] are loaded into register X as an unsigned number 
between 0 and 4,294,967,296, inclusive. 

• LDO $X,$Y,$Z|Z ‘load octa’. 

Bytes Mg[$Y + $Z] or Mg[$Y + Z] are loaded into register X. As mentioned above, 
our notation Mg[fc] implies that the three least significant bits of the address $Y + $Z 
or $Y + Z are ignored and assumed to be 0. 

• LDOU $X,$Y,$Z|Z ‘load octa unsigned’. 

Bytes Mg[$Y + $Z] or Mg[$Y + Z] are loaded into register X. There is in fact no 
difference between the behavior of LDOU and LDO , since an octabyte can be regarded 
as either signed or unsigned. LDOU is included in MMIX just for completeness and 
consistency, in spite of the fact that a foolish consistency is the hobgoblin of little 
minds. (Niklaus Wirth made a strong plea for such consistency in his early critique 
of System/360; see JACM 15 (1967), 37-74.) 

• LDHT $X,$Y,$Z|Z ‘load high tetra’. 

Bytes M 4 [$Y+$Z] or M 4 [$Y +Z] are loaded into the most significant half of register X, 
and the least significant half is cleared to zero. (One use of “high tetra arithmetic” 
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is to detect overflow easily when tetrabytes are added or subtracted.) 

• LDA $X,$Y,$Z|Z ‘load address’. 

The address $Y + $Z or $Y + Z is loaded into register X. This instruction is simply 
another name for the ADDU instruction discussed below; it can be used when the 
programmer is thinking of memory addresses instead of numbers. The MMIX assembler 
converts LDA into the same OP-code as ADDU. 

8. Another family of instructions goes the other way, storing registers into memory. 
For example, the “store octa immediate” command STO $3 , $2 , 17 puts the current 
contents of register 3 into Mg[$2 + 17]. 

• STB $X,$Y,$Z|Z ‘store byte’. 

The least significant byte of register X is stored into byte M[$Y + $Z] or M[$Y + Zj. 
An integer overflow exception occurs if $X is not between —128 and +127. (We will 
discuss overflow and other kinds of exceptions later.) 

• STBU $X,$Y,$Z|Z ‘store byte unsigned’. 

The least significant byte of register X is stored into byte M[$Y + $Z] or M[$Y + Zj. 
STBU instructions are the same as STB instructions, except that no test for overflow 
is made. 

• STW $X,$Y,$Z|Z ‘store wyde’. 

The two least significant bytes of register X are stored into bytes M 2 [$Y + $Z] or 
M 2 [$Y + Zj. An integer overflow exception occurs if $X is not between —32768 and 
+32767. 

• STWU $X,$Y,$Z|Z ‘store wyde unsigned’. 

The two least significant bytes of register X are stored into bytes M 2 [$Y + $Z] or 
M 2 [$Y + Zj. STWU instructions are the same as STW instructions, except that no test 
for overflow is made. 

• STT $X,$Y,$Z|Z ‘store tetra’. 

The four least significant bytes of register X are stored into bytes M 4 [$Y + $Z] or 
M 4 [$Y + Zj. An integer overflow exception occurs if $X is not between —2,147,483,648 
and +2,147,483,647. 

• STTU $X,$Y,$Z|Z ‘store tetra unsigned’. 

The four least significant bytes of register X are stored into bytes M 4 [$Y + $Z] or 
M 4 [$Y + Zj. STTU instructions are the same as STT instructions, except that no test 
for overflow is made. 

• STD $X,$Y,$Z|Z ‘store octa’. 

Register X is stored into bytes Mg[$Y + $Z] or Ms[$Y + Zj. 

• STDU $X,$Y,$Z|Z ‘store octa unsigned’. 

Identical to STO $X,$Y,$Z|Z. 

• STCD X,$Y,$Z|Z ‘store constant octabyte’. 

An octabyte whose value is the unsigned byte X is stored into Mg[$Y + $Z] or 
Mg[$Y + Z]. 

• STHT $X,$Y,$Z|Z ‘store high tetra’. 

The most significant four bytes of register X are stored into M 4 [$Y +$Z] or M 4 [$Y+Zj. 
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9. Adding and subtracting. Once numbers are in registers, we can compute 
with them. Let’s consider addition and subtraction first. 

• ADD $X,$Y,$Z|Z ‘add’. 

The sum $Y + $Z or $Y + Z is placed into register X using signed, two’s complement 
arithmetic. An integer overflow exception occurs if the sum is > 2®^ or < —2®^. (We 
will discuss overflow and other kinds of exceptions later.) 

• ADDU $X,$Y,$Z|Z ‘add unsigned’. 

The sum ($Y + $Z) mod 2®^ or ($Y + Z) mod 2®^ is placed into register X. These 
instructions are the same as ADD $X,$Y,$Z|Z commands except that no test for 
overflow is made. (Overflow could be detected if desired by using the command 
CMPU ovf lo , $X , $Y after addition, where CMPU means “compare unsigned” ; see below.) 

• 2ADDU $X,$Y,$Z|Z ‘times 2 and add unsigned’. 

The sum (2$Y + $Z) mod 2®“^ or (2$Y + Z) mod 2®^ is placed into register X. 

• 4ADDU $X,$Y,$Z|Z ‘times 4 and add unsigned’. 

The sum (4$Y + $Z) mod 2®“^ or (4$Y + Z) mod 2®"^ is placed into register X. 

• 8ADDU $X,$Y,$Z|Z ‘times 8 and add unsigned’. 

The sum (8$Y + $Z) mod 2®“^ or (8$Y + Z) mod 2®"^ is placed into register X. 

• 16ADDU $X,$Y,$Z|Z ‘times 16 and add unsigned’. 

The sum (16$Y + $Z) mod 2®"'’ or (16$Y + Z) mod 2®"*’ is placed into register X. 
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• SUB $X,$Y,$Z|Z ‘subtract’. 

The difference $Y — $Z or $Y — Z is placed into register X using signed, two’s 
complement arithmetic. An integer overflow exception occurs if the difference is > 2®^ 
or < -2^3. 

• SUBU $X,$Y,$Z|Z ‘subtract unsigned’. 

The difference ($Y— $Z) mod 2®'* or ($Y — Z) mod 2®^ is placed into register X. These 
two instructions are the same as SUB $X,$Y,$Z|Z except that no test for overflow is 
made. 

• NEC $X,Y,$Z|Z ‘negate’. 

The value Y — $Z or Y — Z is placed into register X using signed, two’s complement 
arithmetic. An integer overflow exception occurs if the result is greater than 2®® — 1. 
(Notice that in this case MMIX works with the “immediate” constant Y, not register Y. 
NEC commands are analogous to the immediate variants of other commands, because 
they save us from having to put one-byte constants into a register. When Y = 0, 
overflow occurs if and only if $Z = —2®®. The instruction NEC $X , 1 , 2 has exactly the 
same effect as NEC $X,0,1.) 

• NEGU $X,Y,$Z|Z ‘negate unsigned’. 

The value (Y — $Z) mod 2®'^ or (Y — Z) mod 2®^ is placed into register X. NEGU 
instructions are the same as NEG instructions, except that no test for overflow is 
made. 



CMPU, §15. 



MMIX: BIT FIDDLING 



10 



10. Bit fiddling. Before looking at multiplication and division, which take longer 
than addition and subtraction, let’s look at some of the other things that MMIX can 
do fast. There are eighteen instructions for bitwise logical operations on unsigned 
numbers. 

• AND $X,$Y,$Z|Z ‘bitwise and’. 

Each bit of register Y is logically anded with the corresponding bit of register Z or of 
the constant Z, and the result is placed in register X. In other words, a bit of register X 
is set to 1 if and only if the corresponding bits of the operands are both 1; in symbols, 
$X = $Y A $Z or $X = $Y A Z. This means in particular that AND $X,$Y,Z always 
zeroes out the seven most significant bytes of register X, because Os are prefixed to 
the constant byte Z. 

• OR $X,$Y,$Z|Z ‘bitwise or’. 

Each bit of register Y is logically ored with the corresponding bit of register Z or 
of the constant Z, and the result is placed in register X. In other words, a bit of 
register X is set to 0 if and only if the corresponding bits of the operands are both 0; 
in symbols, $X = $Y V $Z or $X = $Y V Z. 

In the special case Z = 0, the immediate variant of this command simply copies 
register Y to register X. The MMIX assembler allows us to write ‘SET $X,$Y’ as a 
convenient abbreviation for ‘OR $X,$Y,0’. 

• XOR $X,$Y,$Z|Z ‘bitwise exclusive-or’. 

Each bit of register Y is logically xored with the corresponding bit of register Z or 
of the constant Z, and the result is placed in register X. In other words, a bit of 
register X is set to 0 if and only if the corresponding bits of the operands are equal; 
in symbols, $X = $Y 0 $Z or $X = $Y © Z. 

• ANDN $X,$Y,$Z|Z ‘bitwise and-not’. 

Each bit of register Y is logically anded with the complement of the corresponding 
bit of register Z or of the constant Z, and the result is placed in register X. In other 
words, a bit of register X is set to 1 if and only if the corresponding bit of register Y 
is 1 and the other corresponding bit is 0; in symbols, $X = $Y \ $Z or $X = $Y \ Z. 
(This is the logical difference operation; if the operands are bit strings representing 
sets, we are computing the elements that lie in one set but not the other.) 

• ORN $X,$Y,$Z|Z ‘bitwise or-not’. 

Each bit of register Y is logically ored with the complement of the corresponding bit 
of register Z or of the constant Z, and the result is placed in register X. In other 
words, a bit of register X is set to 1 if and only if the corresponding bit of register Y 
is greater than or equal to the other corresponding bit; in symbols, $X = $Y V $Z or 
$X = $Y V Z. (This is the complement of $Z \ $Y or Z \ $Y.) 
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• NAND $X,$Y,$Z|Z ‘bitwise not-and’. 

Each bit of register Y is logically anded with the corresponding bit of register Z or 
of the constant Z, and the complement of the result is placed in register X. In other 
words, a bit of register X is set to 0 if and only if the corresponding bits of the 
operands are both 1; in symbols, $X = $Y A $Z or $X = $Y A Z. 

• NOR $X,$Y,$Z|Z ‘bitwise not-or’. 

Each bit of register Y is logically ored with the corresponding bit of register Z or of the 
constant Z, and the complement of the result is placed in register X. In other words, 
a bit of register X is set to I if and only if the corresponding bits of the operands are 
both 0; in symbols, $X = $Y V $Z or $X = $Y V Z. 

• NXDR $X,$Y,$Z|Z ‘bitwise not-exclusive-or’. 

Each bit of register Y is logically xored with the corresponding bit of register Z or 
of the constant Z, and the complement of the result is placed in register X. In other 
words, a bit of register X is set to 1 if and only if the corresponding bits of the 
operands are equal; in symbols, $X = $Y © $Z or $X = $Y © Z. 

• MUX $X,$Y,$Z|Z ‘bitwise multiplex’. 

For each bit position j, the jth bit of register X is set either to bit j of register Y or 
to bit j of the other operand $Z or Z, depending on whether bit j of the special mask 
register rM is 1 or 0; if Mj then Y j else Zj. In symbols, $X = ($Y A rM) V ($Z A rM) 
or $X = ($Y A rM) V (Z A rM). (MMIX has several such special registers, associated 
with instructions that need more than two inputs or produce more than one output.) 
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11. Besides the eighteen bitwise operations, MMIX can also perform unsigned byte- 
wise and biggerwise operations that are somewhat more exotic. 

• BDIF $X,$Y,$Z|Z ‘byte difference’. 

For each byte position j, the jth byte of register X is set to byte j of register Y minus 
byte j of the other operand $Z or Z, unless that difference is negative; in the latter 
case, byte j of $X is set to zero. 

• WDIF $X,$Y,$Z|Z ‘wyde difference’. 

For each wyde position j, the jth wyde of register X is set to wyde j of register Y 
minus wyde j of the other operand $Z or Z, unless that difference is negative; in the 
latter case, wyde j of $X is set to zero. 

• TDIF $X,$Y,$Z|Z ‘tetra difference’. 

For each tetra position j, the jth tetra of register X is set to tetra j of register Y 
minus tetra j of the other operand $Z or Z, unless that difference is negative; in the 
latter case, tetra j of $X is set to zero. 

• ODIF $X,$Y,$Z|Z ‘octa difference’. 

Register X is set to register Y minus the other operand $Z or Z, unless $Z or Z exceeds 
register Y; in the latter case, $X is set to zero. The operands are treated as unsigned 
integers. 

The BDIF and WDIF commands are useful in applications to graphics or video; 
TDIF and ODIF are also present for reasons of consistency. For example, if a and 
b are registers containing 8-byte quantities, their bytewise maxima c and bytewise 
minima d are computed by 

BDIF x,a,b; ADDU c,x,b; SUBU d,a,x; 

similarly, the individual “pixel differences” e, namely the absolute values of the 
differences of corresponding bytes, are computed by 

BDIF x,a,b; BDIF y,b,a; OR e,x,y. 

To add individual bytes of a and b while clipping all sums to 255 if they don’t fit in 
a single byte, one can say 

NOR acomp,a,0; BDIF x,acomp,b; NOR clippedsums ,x,0; 

in other words, complement a, apply BDIF, and complement the result. The opera- 
tions can also be used to construct efficient operations on strings of bytes or wydes. 

Exercise: Implement a “nybble difference” instruction that operates in a similar 
way on sixteen nybbles at a time. 

Answer: ANDx,a,m; ANDy,b,m; ANDN xx , a , m ; ANDNyy,b,m; BDIF x,x,y; BDIF 
xx,xx,yy; OR ans,x,xx where register m contains the mask *0f Of Of Of Of Of Of Of. 

(The ANDN operation can be regarded as a “bit difference” instruction that operates 
in a similar way on 64 bits at a time.) 
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12 . Three more pairs of bit- fiddling instructions round out the collection of exotics. 

• SADD $X,$Y,$Z|Z ‘sideways add’. 

Each bit of register Y is logically anded with the complement of the corresponding 
bit of register Z or of the constant Z, and the number of 1 bits in the result is placed 
in register X. In other words, register X is set to the number of bit positions in which 
register Y has a 1 and the other operand has a 0; in symbols, $X = i/($Y \ $Z) or 
$X = i/($Y \ Z). When the second operand is zero this operation is sometimes called 
“population counting,” because it counts the number of Is in register Y. 

• MDR $X,$Y,$Z|Z ‘multiple or’. 

Suppose the 64 bits of register Y are indexed as 

2/002/01 ■ • ■ 2/072/102/11 • • ■ 2/i7 • ■ • 2/7o2/7i ■ • • 2/77; 

in other words, yij is the jth bit of the ith byte, if we number the bits and bytes from 
0 to 7 in big-endian fashion from left to right. Let the bits of the other operand, $Z 
or Z, be indexed similarly: 



2^00^01 • ■ • ^072:102:11 . ■ . 2:17 . . . 270271 . . . 277. 



The MDR operation replaces each bit Xij of register X by the bit 



yojZio V yijZii V • • • V yrjZir- 



Thus, for example, if register Z contains the constant *0102040810204080, MOR 
reverses the order of the bytes in register Y, converting between little-endian and 
big-endian addressing. (The ith byte of $X depends on the bytes of $Y as specified 
by the ith byte of $Z or Z. If we regard 64-bit words as 8 x 8 Boolean matrices, with 
one byte per column, this operation computes the Boolean product $X = $Y $Z or 
$X = $Y Z. Alternatively, if we regard 64-bit words as 8 x 8 matrices with one byte 
per row, MDR computes the Boolean product $X = $Z $Y or $X = Z $Y with operands 
in the opposite order. The immediate form MDR $X , $Y , Z always sets the leading seven 
bytes of register X to zero; the other byte is set to the bitwise or of whatever bytes 
of register Y are specified by the immediate operand Z.) 

Exercise: Explain how to compute a mask m that is *ff in byte positions where 
a exceeds b, *00 in all other bytes. Answer: BDIF x,a,b; MOR m, minus one ,x; here 
minusone is a register consisting of all Is. (Moreover, if we AND this result with 
*8040201008040201, then MOR with Z = 255, we get a one-byte encoding of m.) 

• MXDR $X,$Y,$Z|Z ‘multiple exclusive-or’. 

This operation is like the Boolean multiplication just discussed, but exclusive-or is 
used to combine the bits. Thus we obtain a matrix product over the field of two 
elements instead of a Boolean matrix product. This operation can be used to construct 
hash functions, among many other things. (The hash functions aren’t bad, but they 
are not “universal” in the sense of Sorting and Searching, exercise 6.4-72.) 
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13. Sixteen “immediate wyde” instructions are available for the common case that 
a 16-bit constant is needed. In this case the Y and Z fields of the instruction are 
regarded as a single 16-bit unsigned number YZ. 

• SETH $X,YZ ‘set to high wyde’; SETMH $X,YZ ‘set to medium high wyde’; 
SETML $X,YZ ‘set to medium low wyde’; SETL $X,YZ ‘set to low wyde’. 

The 16-bit unsigned number YZ is shifted left by either 48 or 32 or 16 or 0 bits, re- 
spectively, and placed into register X. Thus, for example, SETML inserts a given value 
into the second-least-significant wyde of register X and sets the other three wydes to 
zero. 

• INCH $X,YZ ‘increase by high wyde’; INCMH $X,YZ ‘increase by medium high wyde’; 
INCML $X,YZ ‘increase by medium low wyde’; INCH $X,YZ ‘increase by low wyde’. 
The 16-bit unsigned number YZ is shifted left by either 48 or 32 or 16 or 0 bits, 
respectively, and added to register X, ignoring overflow; the result is placed back into 
register X. 

If YZ is the hexadecimal constant ’^8000, the command INCH $X,YZ complements 
the most significant bit of register X. We will see below that this can be used to 
negate a floating point number. 

• ORH $X,YZ ‘bitwise or with high wyde’; ORMH $X,YZ ‘bitwise or with medium high 
wyde’; DRML $X,YZ ‘bitwise or with medium low wyde’; ORL $X,YZ ‘bitwise or with 
low wyde’. 

The 16-bit unsigned number YZ is shifted left by either 48 or 32 or 16 or 0 bits, 
respectively, and ored with register X; the result is placed back into register X. 

Notice that any desired 4- wyde constant GH IJ KL MN can be inserted into a register 
with a sequence of four instructions such as 

SETH $X,GH; INCMH $X,IJ; INCML $X,KL; INCL $X,MN; 

any of these INC instructions could also be replaced by DR. 

• ANDNH $X,YZ ‘bitwise and- not high wyde’; ANDNMH $X,YZ ‘bitwise and- not medium 
high wyde’; ANDNML $X,YZ ‘bitwise and-not medium low wyde’; ANDNL $X,YZ ‘bitwise 
and-not low wyde’. 

The 16-bit unsigned number YZ is shifted left by either 48 or 32 or 16 or 0 bits, 
respectively, then complemented and anded with register X; the result is placed back 
into register X. 

If YZ is the hexadecimal constant * 8000 , the command ANDNH $X,YZ forces the 
most significant bit of register X to be 0. This can be used to compute the absolute 
value of a floating point number. 

14. MMIX knows several ways to shift a register left or right by any number of bits. 

• SL $X,$Y,$Z|Z ‘shift left’. 

The bits of register Y are shifted left by $Z or Z places, and Os are shifted in from 
the right; the result is placed in register X. Register Y is treated as a signed number, 
but the second operand is treated as an unsigned number. The effect is the same as 
multiplication by 2®^ or by 2^; an integer overflow exception occurs if the result is 
> 2®® or < —2®®. In particular, if the second operand is 64 or more, register X will 
become entirely zero, and integer overflow will be signaled unless register Y was zero. 
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• SLU $X,$Y,$Z|Z ‘shift left unsigned’. 

The bits of register Y are shifted left by $Z or Z places, and Os are shifted in from 
the right; the result is placed in register X. Both operands are treated as unsigned 
numbers. The SLU instructions are equivalent to SL, except that no test for overflow 
is made. 

• SR $X,$Y,$Z|Z ‘shift right’. 

The bits of register Y are shifted right by $Z or Z places, and copies of the leftmost bit 
(the sign bit) are shifted in from the left; the result is placed in register X. Register Y 
is treated as a signed number, but the second operand is treated as an unsigned 
number. The effect is the same as division by 2®^ or by 2^ and rounding down. In 
particular, if the second operand is 64 or more, register X will become zero if $Y was 
nonnegative, —1 if $Y was negative. 

• SRU $X,$Y,$Z|Z ‘shift right unsigned’. 

The bits of register Y are shifted right by $Z or Z places, and Os are shifted in from 
the left; the result is placed in register X. Both operands are treated as unsigned 
numbers. The effect is the same as unsigned division of a 64-bit number by 2*^ or 
by 2^; if the second operand is 64 or more, register X will become entirely zero. 
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15. Comparisons. Arithmetic and logical operations are nice, but computer 
programs also need to compare numbers and to change the course of a calculation 
depending on what they find. MMIX has four comparison instructions to facilitate such 
decision-making. 

• CMP $X,$Y,$Z|Z ‘compare’. 

Register X is set to —1 if register Y is less than register Z or less than the unsigned 
immediate value Z, using the conventions of signed arithmetic; it is set to 0 if register Y 
is equal to register Z or equal to the unsigned immediate value Z; otherwise it is set 
to 1. In symbols, $X = [$Y>$Z] - [$Y<$Z] or $X = [$Y>Z] - [$Y<Z]. 

• CMPU $X,$Y,$Z|Z ‘compare unsigned’. 

Register X is set to —1 if register Y is less than register Z or less than the unsigned 
immediate value Z, using the conventions of unsigned arithmetic; it is set to 0 if 
register Y is equal to register Z or equal to the unsigned immediate value Z; otherwise 
it is set to 1. In symbols, $X = [$Y>$Z] - [$Y<$Z] or $X = [$Y>Z] - [$Y<Z]. 
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16 . There also are 32 conditional instructions, which choose quickly between two 
alternative courses of action. 

• CSN $X,$Y,$Z|Z ‘conditionally set if negative’. 

If register Y is negative (namely if its most significant bit is 1), register X is set to 
the contents of register Z or to the unsigned immediate value Z. Otherwise nothing 
happens. 

• CSZ $X,$Y,$Z|Z ‘conditionally set if zero’. 

• CSP $X,$Y,$Z|Z ‘conditionally set if positive’. 

• CSOD $X,$Y,$Z|Z ‘conditionally set if odd’. 

• CSNN $X,$Y,$Z|Z ‘conditionally set if nonnegative’. 

• CSNZ $X,$Y,$Z|Z ‘conditionally set if nonzero’. 

• CSNP $X,$Y,$Z|Z ‘conditionally set if nonpositive’. 

• CSEV $X,$Y,$Z|Z ‘conditionally set if even’. 

These instructions are entirely analogous to CSN, except that register X changes only 
if register Y is respectively zero, positive, odd, nonnegative, nonzero, nonpositive, or 
nonodd. 

• ZSN $X,$Y,$Z|Z ‘zero or set if negative’. 

If register Y is negative (namely if its most significant bit is 1), register X is set to 
the contents of register Z or to the unsigned immediate value Z. Otherwise register X 
is set to zero. 

• ZSZ $X,$Y,$Z|Z ‘zero or set if zero’. 

• ZSP $X,$Y,$Z|Z ‘zero or set if positive’. 

• ZSDD $X , $Y , $Z I Z ‘zero or set if odd’. 

• ZSNN $X,$Y,$Z|Z ‘zero or set if nonnegative’. 

• ZSNZ $X,$Y,$Z|Z ‘zero or set if nonzero’. 

• ZSNP $X,$Y,$Z|Z ‘zero or set if nonpositive’. 

• ZSEV $X,$Y,$Z|Z ‘zero or set if even’. 

These instructions are entirely analogous to ZSN, except that $X is set to $Z or Z if 
register Y is respectively zero, positive, odd, nonnegative, nonzero, nonpositive, or 
even; otherwise $X is set to zero. 

Notice that the two instructions CMPU r , s , 0 and ZSNZ r , s , 1 have the same effect. 
So do the two instructions CSNP r,s,0 and ZSP r,s,r. So do AND r,s,l and 
ZSDD r,s, 1. 
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17. Branches and jumps. MMIX ordinarily executes instructions in sequence, 
proceeding from an instruction in tetrabyte M 4 [A] to the instruction in M 4 [A + 4]. 
But there are several ways to interrupt the normal flow of control, most of which 
use the Y and Z fields of an instruction as a combined 16-bit YZ field. For example, 
BNZ $3 , 0+4000 (branch if nonzero) is typical: It means that control should skip ahead 
1000 instructions to the command that appears 4000 bytes after the BNZ, if register 3 
is not equal to zero. 

There are eight branch-forward instructions, corresponding to the eight conditions 
in the CS and ZS commands that we discussed earlier. And there are eight similar 
branch-backward instructions; for example, BOD $2,0-4000 (branch if odd) takes 
control to the instruction that appears 4000 bytes before this BOD command, if 
register 2 is odd. The numeric OP-code when branching backward is one greater than 
the OP-code when branching forward; the assembler takes care of this automatically, 
just as it takes cares of changing ADD from 32 to 33 when necessary. 

Since branches are relative to the current location, the MMIX assembler treats branch 
instructions in a special way. Suppose a programmer writes ‘BNZ $3,Case5’, where 
Cases is the address of an instruction in location 1. If this instruction appears in 
location A, the assembler first computes the displacement 6 = [{I — A)/4J. Then if 
6 is nonnegative, the quantity S is placed in the YZ field of a BNZ command, and it 
should be less than 2^®; if 6 is negative, the quantity 2^® -f 5 is placed in the YZ field 
of a BNZ command with OP-code increased by 1, and S should not be less than —2^®. 

The symbol @ used in our examples of BNZ and BOD above is interpreted by the 
assembler as an abbreviation for “the location of the current instruction.” In the 
following notes we will define pairs of branch commands by writing, for example, 
‘BNZ $X,@+4+YZ [-262144] ’; this stands for a branch-forward command that branches 
to the current location plus four times YZ, as well as for a branch-backward command 
that branches to the current location plus four times (YZ — 65536). 

• BN $X,@+4+YZ [-262144] ‘branch if negative’. 

• BZ $X,@+4*YZ [-262144] ‘branch if zero’. 

• BP $X,@+4+YZ [-262144] ‘branch if positive’. 

• BOD $X,@+4*YZ [-262144] ‘branch if odd’. 

• BNN $X,@+4*YZ [-262144] ‘branch if nonnegative’. 

• BNZ $X,@+4+YZ [-262144] ‘branch if nonzero’. 

• BNP $X,@+4*YZ [-262144] ‘branch if nonpositive’. 

• BEV $X,@+4+YZ [-262144] ‘branch if even’. 

If register X is respectively negative, zero, positive, odd, nonnegative, nonzero, non- 
positive, or even, and if this instruction appears in memory location A, the next in- 
struction is taken from memory location A-I-4YZ (branching forward) or A-p4(YZ— 2^®) 
(branching backward). Thus one can go from location A to any location between 
A — 262,144 and A + 262,140, inclusive. 

Sixteen additional branch instructions called probable branches are also provided. 
They have exactly the same meaning as ordinary branch instructions; for example, 
PBOD $2,0-4000 and BOD $2,0-4000 both go backward 4000 bytes if register 2 is 
odd. But they differ in running time: On some implementations of MMIX, a branch 
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instruction takes longer when the branch is taken, while a probable branch takes 
longer when the branch is not taken. Thus programmers should use a B instruction 
when they think branching is relatively unlikely, but they should use PB when they 
expect branching to occur more often than not. Here is a list of the probable branch 
commands, for completeness: 



PBN $X,@+4*YZ[ 
PBZ $X,@+4*YZ[ 
PBP $X,@+4*YZ[ 
PBOD $X,@+4*YZ 
PBNN $X,@+4*YZ 
PBNZ $X,@+4*YZ 
PBNP $X,@+4*YZ 
PBEV $X,@+4*YZ 



262144] ‘probable branch if negative’. 
262144] ‘probable branch if zero’. 
-262144] ‘probable branch if positive’. 
[-262144] ‘probable branch if odd’. 
[-262144] ‘probable branch if nonnegative’. 
[-262144] ‘probable branch if nonzero’. 
[-262144] ‘probable branch if nonpositive’. 
[-262144] ‘probable branch if even’. 



18 . Locations that are relative to the current instruction can be transformed into 
absolute locations with GETA commands. 



• GETA $X,@+4*YZ [-262144] ‘get address’. 

The value A + 4YZ or A + 4(YZ — 2^®) is placed in register X. (The assembly language 
conventions of branch instructions apply; for example, we can write ‘GETA $X,Addr’.) 



19 . MMIX also has unconditional jump instructions, which change the location of 
the next instruction no matter what. 

• JMP @+4*XYZ [-67108864] ‘jump’. 

A JMP command treats bytes X, Y, and Z as an unsigned 24-bit integer XYZ. It allows 
a program to transfer control from location A to any location between A — 67,108,864 
and A + 67,108,860 inclusive, using relative addressing as in the B and PB commands. 

• GO $X,$Y,$Z|Z ‘go to location’. 

MMIX takes its next instruction from location $Y + $Z or $Y + Z, and continues from 
there. Register X is set equal to A -P 4, the location of the instruction that would 
ordinarily have been executed next. (GO is similar to a jump, but it is not relative 
to the current location. Since GO has the same format as a load or store instruction, 
a loading routine can treat program labels with the same mechanism that is used to 
treat references to data.) 

An old-fashioned type of subroutine linkage can be implemented by saying either 
‘GO r,subloc,0’ or ‘GETA r,@+8; JMP Sub’ to enter a subroutine, then ‘GO r,r,0’ to 
return. But subroutines are normally entered with the instructions PUSHJ or PUSHGO. 

The two least significant bits of the address in a GO command are essentially ignored. 
They will, however, appear in the value of A returned by GETA instructions, and in the 
return-jump register rJ after PUSHJ or PUSHGO instructions are performed, and in the 
where-interrupted register at the time of an interrupt. Therefore they could be used 
to send some kind of signal to a subroutine or (less likely) to an interrupt handler. 



PUSHGO, §29. 



PUSHJ, §29. 
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20. Multiplication and division. Now for some instructions that make MMIX 
work harder. 

• MUL $X,$Y,$Z|Z ‘multiply’. 

The signed product of the number in register Y by either the number in register Z or 
the unsigned byte Z replaces the contents of register X. An integer overflow exception 
can occur, as with ADD or SUB, if the result is less than —2®^ or greater than 2®^ — 1. 
(Immediate multiplication by powers of 2 can be done more rapidly with the SL 
instruction.) 

• MULU $X,$Y,$Z|Z ‘multiply unsigned’. 

The lower 64 bits of the unsigned 128-bit product of register Y and either register Z 
or Z are placed in register X, and the upper 64 bits are placed in the special himult 
register rH. (Immediate multiplication by powers of 2 can be done more rapidly with 
the SLU instruction, if the upper half is not needed. Furthermore, an instruction like 
4ADDU $X,$Y,$Y is faster than MULU $X,$Y,5.) 
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• DIV $X,$Y,$Z|Z ‘divide’. 

The signed quotient of the number in register Y divided by either the number in 
register Z or the unsigned byte Z replaces the contents of register X, and the signed 
remainder is placed in the special remainder register rR. An integer divide check 
exception occurs if the divisor is zero; in that case $X is set to zero and rR is set 
to $Y. An integer overflow exception occurs if the number —2®^ is divided by — 1; 
otherwise integer overflow is impossible. The quotient of y divided by z is defined to 
be \y/z\, and the remainder is defined tohe y—\y/z\z (also written y mod z). Thus, 
the remainder is either zero or has the sign of the divisor. Dividing by z = 2* gives 
exactly the same quotient as shifting right t via the SR command, and exactly the 
same remainder as anding with z — 1 via the AND command. Division of a positive 
63-bit number by a positive constant can be accomplished more quickly by computing 
the upper half of a suitable unsigned product and shifting it right appropriately. 

• DIVU $X,$Y,$Z|Z ‘divide unsigned’. 

The unsigned 128-bit number obtained by prefixing the special dividend register iD to 
the contents of register Y is divided either by the unsigned number in register Z or by 
the unsigned byte Z, and the quotient is placed in register X. The remainder is placed 
in the remainder register rR. However, if rD is greater than or equal to the divisor 
(and in particular if the divisor is zero), then $X is set to rD and rR is set to $Y. 
(Unsigned arithmetic never signals an exceptional condition, even when dividing by 
zero.) If rD is zero, unsigned division by z = 2* gives exactly the same quotient 
as shifting right t via the SRU command, and exactly the same remainder as anding 
with z — 1 via the AND command. Section 4.3.1 of Seminumerical Algorithms explains 
how to use unsigned division to obtain the quotient and remainder of extremely large 
numbers. 
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21. Floating point computations. Floating point arithmetic conforming to the 
famous IEEE/ ANSI Standard 754 is provided for arbitrary 64-bit numbers. The IEEE 
standard refers to such numbers as “double format” quantities, but MMIX calls them 
simply floating point numbers because 64-bit quantities are the norm. 

A positive floating point number has 53 bits of precision and can range from 
approximately 10“^°® to 10®°®. “Subnormal numbers” between 10“®^^ and 10“®°® 
can also be represented, but with fewer bits of precision. Eloating point numbers 
can be infinite, and they satisfy such identities as l.O/oo = -1-0.0, —2.8 x oo = — oo. 
Floating point quantities can also be “Not-a-Numbers” or NaNs, which are further 
classified into signaling NaNs and quiet NaNs. 

Five kinds of exceptions can occur during floating point computations, and they 
each have code letters: Floating overflow (O) or underflow (U); floating divide by 
zero (Z); floating inexact (X); and floating invalid (I). For example, the multiplication 
of sufficiently small integers causes no exceptions, and the division of 91.0 by 13.0 is 
also exception-free, but the division 1. 0/3.0 is inexact. The multiplication of extremely 
large or extremely small floating point numbers is inexact and it also causes overflow 
or underflow. Invalid results occur when taking the square root of a negative number; 
mathematicians can remember the I exception by relating it to the square root of — 1.0. 
Invalid results also occur when trying to convert infinity or a quiet NaN to a fixed- 
point integer, or when any signaling NaN is encountered, or when mathematically 
undefined operations like oo — oo or 0/0 are requested. (Programmers can be sure 
that they have not erroneously used uninitialized floating point data if they initialize 
all their variables to signaling NaN values.) 

Four different rounding modes for inexact results are available: round to nearest 
(and to even in case of ties); round off (toward zero); round up (toward -|-oo); or round 
down (toward —oo). MMIX has a special arithmetic status register rA that specifies 
the current rounding mode and the user’s current preferences for exception handling. 

IEEE standard arithmetic provides an excellent foundation for scientific calcula- 
tions, and it will be thoroughly explained in the fourth edition of Semmumerical 
Algorithms, Section 4.2. Eor our present purposes, we need not study all the details; 
but we do need to specify MMIX’s behavior with respect to several things that are not 
completely defined by the standard. For example, the IEEE standard does not fully 
define the result of operations with NaNs. 

When an octabyte represents a floating point number in MMIX’s registers, the 
leftmost bit is the sign; then come II bits for an exponent e; and the remaining 52 bits 
are the fraction part /. We regard e as an integer between 0 and (11111111111)2 = 
2047, and we regard f as a fraction between 0 and (.111 . . . 1)2 = 1 — 2“®^. Each 
octabyte has the following significance: 



± 0 . 0 , 
j_2~ 1022 j 

±2®-®°23(l-h/), 

±00, 

±NaN(/), 

±NaN(/), 



if e = / = 0 (zero); 

if e = 0 and / > 0 (subnormal); 

if 0 < e < 2047 (normal); 

if e = 2047 and / = 0 (infinite); 

if e = 2047 and 0 < / < 1/2 (signaling NaN); 

if e = 2047 and / > 1/2 (quiet NaN). 
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Notice that +0.0 is distinguished from —0.0; this fact is important for interval arith- 
metic. 

Exercise: What 64 bits represent the floating point number 1.0? Answer: We want 
e = 1023 and / = 0, so the answer is *3ff 0000000000000. 

Exercise: What is the largest finite floating point number? Answer: We want 
e = 2046 and / = 1 — 2“^^, so the answer is *7f ef f f f f f f f f f f f f = — 2®^^. 
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22. The seven IEEE floating point arithmetic operations (addition, subtraction, 
multiplication, division, remainder, square root, and nearest-integer) all share com- 
mon features, called the standard floating point conventions in the discussion below: 
The operation is performed on floating point numbers found in two registers, $Y 
and $Z, except that square root and integerization involve only one operand. If nei- 
ther input operand is a NaN, we first determine the exact result, then round it using 
the current rounding mode found in special register rA. Infinite results are exact and 
need no rounding. A floating overflow exception occurs if the rounded result is finite 
but needs an exponent greater than 2046. A floating underflow exception occurs if 
the rounded result needs an exponent less than 1 and either (i) the unrounded result 
cannot be represented exactly as a subnormal number or (ii) the “floating underflow 
trip” is enabled in rA. (Trips are discussed below.) NaNs are treated specially as 
follows: If either $Y or $Z is a signaling NaN, an invalid exception occurs and the 
NaN is quieted by adding 1/2 to its fraction part. Then if $Z is a quiet NaN, the 
result is set to $Z; otherwise if $Y is a quiet NaN, the result is set to $Y. (Registers 
$Y and $Z do not actually change.) 

• FADD $X,$Y,$Z ‘floating add’. 

The floating point sum $Y-|-$Z is computed by the standard floating point conventions 
just described, and placed in register X. An invalid exception occurs if the sum is 
(-l-oo) -I- (— oo) or (— oo) -I- (-Foo); in that case the result is NaN(I/2) with the sign 
of $Z. If the sum is exactly zero and the current mode is not rounding-down, the 
result is -FO.O except that (—0.0) + (—0.0) = —0.0. If the sum is exactly zero and the 
current mode is rounding-down, the result is —0.0 except that (-1-0.0) -|-(-f 0.0) = -FO.O. 
These rules for signed zeros turn out to be useful when doing interval arithmetic: If 
the lower bound of an interval is -1-0.0 or if the upper bound is —0.0, the interval does 
not contain zero, so the numbers in the interval have a known sign. 

Floating point underflow cannot occur unless the U-trip has been enabled, because 
any underflowing result of floating point addition can be represented exactly as a 
subnormal number. 

Silly but instructive exercise: Find all pairs of numbers ($Y, $Z) such that the 
commands FADD $X,$Y,$Z and ADDU $X,$Y,$Z both produce the same result in $X 
(although FADD may cause floating exceptions). Answer: Of course $Y or $Z could 
be zero, if the other one is not a signaling NaN. Or one could be signaling and the 
other ’^0008000000000000. Other possibilities occur when they are both positive 
and less than ’*‘0010000000000001; or when one operand is ’*‘0000000000000001 and 
the other is an odd number between ’*‘0020000000000001 and ’*‘002f f f f f f f f f f f f d 
inclusive (rounding to nearest). And still more surprising possibilities exist, such as 
’*‘7f6001b4c67bc809-|-’*‘ff5ffb6a4534a3f7. All eight families of solutions will be 
revealed some day in the fourth edition of Seminumerical Algorithms. 

• FSUB $X,$Y,$Z ‘floating subtract’. 

This instruction is equivalent to FADD, but with the sign of $Z negated unless $Z is 
a NaN. 

• FMUL $X,$Y,$Z ‘floating multiply’. 

The floating point product $Y x $Z is computed by the standard floating point 
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conventions, and placed in register X. An invalid exception occurs if the product is 
(±0.0) X (±oo) or (±oo) X (±0.0); in that case the result is ±NaN(l/2). No exception 
occurs for the product (±oo) x (±oo). If neither $Y nor $Z is a NaN, the sign of the 
result is the product of the signs of $Y and $Z. 

• FDIV $X,$Y,$Z ‘floating divide’. 

The floating point quotient $Y/$Z is computed by the standard floating point con- 
ventions, and placed in $X. A floating divide by zero exception occurs if the quo- 
tient is (normal or subnormal)/ (±0.0). An invalid exception occurs if the quotient is 
(±0.0)/(±0.0) or (±oo)/(±oo); in that case the result is ±NaN(l/2). No exception 
occurs for the quotient (±oo)/(±0.0). If neither $Y nor $Z is a NaN, the sign of the 
result is the product of the signs of $Y and $Z. 

If a floating point number in register X is known to have an exponent between 2 
and 2046, the instruction INCH $X,#fff0 will divide it by 2.0. 

• FREM $X,$Y,$Z ‘floating remainder’. 

The floating point remainder $Y remSZ is computed by the standard floating point 
conventions, and placed in register X. (The IEEE standard defines the remainder to 
be $Y — n X $Z, where n is the nearest integer to $Y /$Z, and n is an even integer 
in case of ties. This is not the same as the remainder $Y mod $Z computed by DIV 
or DIVU.) A zero remainder has the sign of $Y. An invalid exception occurs if $Y is 
infinite and/or $Z is zero; in that case the result is NaN(l/2) with the sign of $Y. 

• FSQRT $X,$Z ‘floating square root’. 

The floating point square root -\/$Z is computed by the standard floating point 
conventions, and placed in register X. An invalid exception occurs if $Z is a negative 
number (either infinite, normal, or subnormal); in that case the result is — NaN(l/2). 
No exception occurs when taking the square root of —0.0 or ±oo. In all cases the sign 
of the result is the sign of $Z. 

The Y field of FSQRT can be used to specify a special rounding mode, as explained 
below. 

• FINT $X,$Z ‘floating integer’. 

The floating point number in register Z is rounded (if necessary) to a floating point 
integer, using the current rounding mode, and placed in register X. Infinite values 
and quiet NaNs are not changed; signaling NaNs are treated as in the standard 
conventions. Floating point overflow and underflow exceptions cannot occur. 

The Y field of FINT can be used to specify a special rounding mode, as explained 
below. 
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23. Besides doing arithmetic, we need to compare floating point numbers with each 
other, taking proper account of NaNs and the fact that —0.0 should be considered 
equal to +0.0. The following instructions are analogous to the comparison operators 
CMP and CMPU that we have used for integers. 

• FCMP $X,$Y,$Z ‘floating compare’. 

Register X is set to —1 if $Y < $Z according to the conventions of floating point 
arithmetic, or to 1 if $Y > $Z according to those conventions. Otherwise it is set 
to 0. An invalid exception occurs if either $Y or $Z is a NaN; in such cases the result 
is zero. 

• FEQL $X,$Y,$Z ‘floating equal to’. 

Register X is set to 1 if $Y = $Z according to the conventions of floating point 
arithmetic. Otherwise it is set to 0. The result is zero if either $Y or $Z is a NaN, 
even if a NaN is being compared with itself. However, no invalid exception occurs, 
not even when $Y or $Z is a signaling NaN. (Perhaps MMIX differs slightly from the 
IEEE standard in this regard, but programmers sometimes need to look at signaling 
NaNs without encountering side effects. Programmers who insist on raising an invalid 
exception whenever a signaling NaN is compared for floating equality should issue the 
instructions FSUB $X,$Y,$Y; FSUB $X,$Z,$Z just before saying FEQL $X,$Y,$Z.) 

Suppose u>, X, y, and z are unsigned 64-bit integers with w < x < 2®^ < y < z. 
Thus, the leftmost bits of w and x are 0, while the leftmost bits of y and z are 1. Then 
we have w < x < y < z when these numbers are considered as unsigned integers, but 
y < z < w < X when they are considered as signed integers, because y and z are 
negative. Eurthermore, we have z < y < w < x when these same 64-bit quantities are 
considered to be floating point numbers, assuming that no NaNs are present, because 
the leftmost bit of a floating point number represents its sign and the remaining bits 
represent its magnitude. The case y = w occurs in floating point comparison if and 
only if y is the representation of —0.0 and w is the representation of +0.0. 

• FUN $X,$Y,$Z ‘floating unordered’. 

Register X is set to 1 if $Y and $Z are unordered according to the conventions of 
floating point arithmetic (namely, if either one is a NaN); otherwise register X is set 
to 0. No invalid exception occurs, not even when $Y or $Z is a signaling NaN. 

The IEEE standard discusses 26 different possible relations on floating point num- 
bers; MMIX implements 14 of them with single instructions, followed by a branch (or 
by a ZS to make a “pure” 0 or 1 result); all 26 can be evaluated with a sequence of at 
most four MMIX commands and a subsequent branch. The hardest case to handle is 
‘?>=’ (unordered or greater or equal, to be computed without exceptions), for which 
the following sequence makes $X > 0 if and only if $Y ?>= $Z: 

FUN $255,$Y,$Z 

BP $255, IF 7, skip cUiead if unordered 

FCMP $X,$Y,$Z 7, $X= [$Y>$Z] - [$Y<$Z] ; no exceptions will arise 

IH CSNZ $X, $255,1 7. $X=1 if unordered 
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24. Exercise: Suppose MMIX had no PINT instruction. Explain how to obtain the 
equivalent of PINT $X , $Z using other instructions. Your program should do the proper 
thing with respect to NaNs and exceptions. (For example, it should cause an invalid 
exception if and only if $Z is a signaling NaN; it should cause an inexact exception 
only if $Z needs to be rounded to another value.) 

Answer: (The assembler prefixes hexadecimal constants by #.) 



SETH 


o 

CO 

CO 

o 


CN 

LO 

< 

CN 

II 

O 


SET 


$ 1 , $Z / 


i $1=$Z 


ANDNH 


o 

o 

o 

00 


' $l=abs($Z) 


ANDN 


$2,$Z,$1 / 


{ $2=signbit ($Z) 


PUN 


$3,$Z,$Z / 


' $3=[$Z is a NaN] 


BNZ 


$3, IP ”/ 


{ skip ahead if $Z is a NaN 


PCMP 


o 

CO 


C $3= [abs ($Z) >2'52] - [abs ($Z) <2"52] 


CSNN 


$0,$3,0 / 


C set $0=0 if $3>=0 


DR 


O 

CN 

O 


C attach sign of $Z to $0 


PADD 


$1,$Z,$0 "/ 


' $l=$Z+$0 


PSUB 




C $X=$l-$0 


DR 


$X,$1,$2 ”/ 


C make sure minus zero isn’t lost 



This program handles most cases of interest by adding and subtracting ±2®^ using 
floating point arithmetic. It would be incorrect to do this in all cases; for example, 
such addition/subtraction might fail to give the correct answer when $Z is a small 
negative quantity (if rounding toward zero), or when $Z is a number like 2 ^^^ + 2®^ 
(if rounding to nearest). 



MMIX: FLOATING POINT COMPUTATIONS 



28 



25 . MMIX goes beyond the IEEE standard to define additional relations between 
floating point numbers, as suggested by the theory in Section 4.2.2 of Seminumerical 
Algorithms. Given a nonnegative number e, each normal floating point number 
u = (/, e) has a neighborhood 

N,{u) = {a; I |x-u| < 

we also define N^{0) = {0}, N^{u) = {a; | |x — u| < if u is subnormal; 

N^{±oo) = {±oo} if e < 1, N^{±oo) = {everything except Too} if 1 < e < 2, 
A^e(±oo) = {everything} if e > 2. Then we write 

u ^ V (e), if M < N^{v) and N^{u) < v\ 

u ^ V (e), if M € N^{v) or V G N^{u); 

uKi V (e), if M € N^{v) and v € N^{u); 

at T u (e), if at > N^{v) and N^{u) > v. 

• FCMPE $X,$Y,$Z ‘floating compare (with respect to epsilon)’. 

Register X is set to —1 if $Y T $Z (rE) according to the conventions of Seminumerical 
Algorithms as stated above; it is set to 1 if $Y >- $Z (rE) according to those 
conventions; otherwise it is set to 0. Here rE is a floating point number in the special 
epsilon register^ which is used only by the floating point comparison operations FCMPE, 
FEQLE, and FUNE. An invalid exception occurs, and the result is zero, if any of $Y, 
$Z, or rE are NaN, or if rE is negative. If no such exception occurs, exactly one of 
the three conditions $Y T $Z, $Y ^ $Z, $Y T $Z holds with respect to rE. 

• FEQLE $X,$Y,$Z ‘floating equivalent (with respect to epsilon)’. 

Register X is set to 1 if $Y $Z (rE) according to the conventions of Seminumerical 
Algorithms as stated above; otherwise it is set to 0. An invalid exception occurs, 
and the result is zero, if any of $Y, $Z, or rE are NaN, or if rE is negative. Notice 
that the relation $Y Ri $Z computed by FEQLE is stronger than the relation $Y ^ $Z 
computed by FCMPE. 

• FUNE $X,$Y,$Z ‘floating unordered (with respect to epsilon)’. 

Register X is set to 1 if $Y, $Z, or rE are exceptional as discussed for FCMPE and FEQLE; 
otherwise it is set to 0. No exceptions occur, even if $Y, $Z, or rE is a signaling NaN. 
Exercise: What floating point numbers does FCMPE regard as ^ 0.0 with respect 
to e = 1/2, when no exceptions arise? Answer: Zero, subnormal numbers, and 
normal numbers with / = 0. (The numbers similar to zero with respect to e are zero, 
subnormal numbers with / < 2e, normal numbers with / < 2e— 1, and ±cx) if e >= 1.) 

26 . The IEEE standard also defines 32-bit floating point quantities, which it calls 
“single format” numbers. MMIX calls them short floats, and converts between 32- 
bit and 64-bit forms when such numbers are loaded from memory or stored into 
memory. A short float consists of a sign bit followed by an 8-bit exponent and a 23- 
bit fraction. After it has been loaded into one of MMIX’s registers, its 52-bit fraction 
part will have 29 trailing zero bits, and its exponent e will be one of the 256 values 0, 
(01110000001)2 = 897, (01110000010)2 = 898, ..., (10001111110)2 = 1150, or 2047, 
unless it was subnormal; a subnormal short float loads into a normal number with 
874 < e < 896. 



29 



MMIX: FLOATING POINT COMPUTATIONS 



• LDSF $X,$Y,$Z|Z ‘load short float’. 

Register X is set to the 64-bit floating point number corresponding to the 32-bit 
floating point number represented by M 4 [$Y -|- $Z] or M 4 [$Y -|- Z], No arithmetic 
exceptions occur, not even if a signaling NaN is loaded. 

• STSF $X,$Y,$Z|Z ‘store short float’. 

The value obtained by rounding register X to a 32-bit floating point number is placed 
in M 4 [$Y + $Z] or M 4 [$Y -F Z]. Rounding is done with the current rounding mode, in 
a manner exactly analogous to the standard conventions for rounding 64-bit results, 
except that the precision and exponent range are limited. In particular, floating 
overflow, underflow, and inexact exceptions might occur; a signaling NaN will trigger 
an invalid exception and it will become quiet. The fraction part of a NaN is truncated 
if necessary to a multiple of 2“^^, by ignoring the least significant 29 bits. 

If we load any two short floats and operate on them once with either FADD, FSUB, 
FMUL, FDIV, FREM, FSQRT, or FINT, and if we then store the result as a short float, 
we obtain the results required by the IEEE standard for single format arithmetic, 
because the double format can be shown to have enough precision to avoid any 
problems of “double rounding.” But programmers are usually better off sticking 
to 64-bit arithmetic unless they have a strong reason to emulate the precise behavior 
of a 32-bit computer; 32 bits do not offer much precision. 

27 . Of course we need to be able to go back and forth between integers and floating 
point values. 

• FIX $X,$Z ‘convert floating to fixed’. 

The floating point number in register Z is converted to an integer as with the FINT 
instruction, and the resulting integer (mod 2 ^^) is placed in register X. An invalid 
exception occurs if $Z is infinite or a NaN; in that case $X is simply set equal to $Z. 
A float-to-fix exception occurs if the result is less than —2®^ or greater than 2®® — 1 . 

• FIXU $X,$Z ‘convert floating to fixed unsigned’. 

This instruction is identical to FIX except that no float-to-fix exception occurs. 

• FLDT $X,$Z|Z ‘convert fixed to floating’. 

The integer in $Z or the immediate constant Z is converted to the nearest floating 
point value (using the current rounding mode) and placed in register X. A floating 
inexact exception occurs if rounding is necessary. 

• FLDTU $X,$Z|Z ‘convert fixed to floating unsigned’. 

FLOTU is like FLOT, but $Z is treated as an unsigned integer. 

• SFLOT $X,$Z|Z ‘convert fixed to short float’; SFLOTU $X,$Z|Z ‘convert fixed to 
short float unsigned’. 

The SFLOT instructions are like the FLDT instructions, except that they round to a 
floating point number whose fraction part is a multiple of 2“^®. (Thus, the resulting 
value will not be changed by a “store short float” instruction.) Such conversions 
appear in MMIX’s repertoire only to establish complete conformance with the IEEE 
standard; a programmer needs them only when emulating a 32-bit machine. 
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28. Since the variants of FIX and FLOT involve only one input operand ($Z or Z), 
their Y field is normally zero. A programmer can, however, force the mode of rounding 
used with these commands by setting 



Y = 


1, 


ROUND. 


.OFF 


(none); 


Y = 


2, 


ROUND. 


.UP 


(away from zero 


Y = 


3, 


ROUND. 


.DOWN 


(toward zero); 


Y = 


4, 


ROUND. 


.NEAR 


(to closest); 



for example, the instruction FLOTU $X,R0UND_0FF,$Z will set the exponent e of 
register X to 1086 — Z if $Z is a nonzero quantity with I leading zero bits. Thus we can 
count leading zeros by continuing with SETL $0,1086; SR $X,$X,52; SUB $X,$0,$X; 
CSZ $X,$Z,64. 

The Y field can also be used in the same way to specify any desired rounding 
mode in the other floating point instructions that have only a single operand, namely 
FSQRT and FINT. An illegal instruction interrupt occurs if Y exceeds 4 in any of these 
commands. 
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29. Subroutine linkage. MMIX has several special operations designed to facili- 
tate the process of calling and implementing subroutines. The key notion is the idea 
of a hardware-supported register stack, which can coexist with a software-supported 
stack of variables that are not maintained in registers. From a programmer’s stand- 
point, MMIX maintains a potentially unbounded list S'p], S'[l], . . . , 5'[r— 1] of octabytes 
holding the contents of registers that are temporarily inaccessible; initially r = 0. 
When a subroutine is entered, registers can be “pushed” on to the end of this list, in- 
creasing r; when the subroutine has finished its execution, the registers are “popped” 
off again and r decreases. 

Our discussion so far has treated all 256 registers $0, $1, . . . , $255 as if they were 
alike. But in fact, MMIX maintains two internal one-byte counters L and G, where 
0<L<G<256, with the property that 

registers 0, 1, . . . , L — 1 are “local”; 
registers L, L-l-1, ..., G— 1 are “marginal”; 
registers G, G -I- 1, . . . , 255 are “global.” 

A marginal register is zero when its value is read. 

The G counter is normally set to a fixed value once and for all when a program 
is loaded, thereby defining the number of program variables that will live entirely in 
registers rather than in memory during the course of execution. A programmer may, 
however, change G dynamically using the PUT instruction described below. 

The L counter starts at 0. If an instruction places a value into a register that is 
currently marginal, namely a register x such that L < x < G, the value of L will 
increase to a; -I- 1, and any newly local registers will be zero. For example, if L = 10 
and G = 200, the instruction ADD $5, $15,1 would simply set $5 to 1. But the 
instruction ADD $15, $5, $200 would set $10, $11, . . . , $14 to zero, $15 to $5 -I- $200, 
and L to 16. (The process of clearing registers and increasing L might take quite a 
few machine cycles in the worst case. We will see later that MMIX is able to take care 
of any high-priority interrupts that might occur during this time.) 

• PUSHJ $X,@+4*YZ [-262144] ‘push registers and jump’. 

• PUSHGO $X,$Y,$Z|Z ‘push registers and go’. 

Suppose first that X < L. Register X is set equal to the number X, then registers 0, 
1, . . . , X are pushed onto the register stack as described below. If this instruction is 
in location A, the value A -I- 4 is placed into the special return-jump register rJ. Then 
control jumps to instruction A -I- 4YZ or A -I- 4YZ — 262144 or $Y -|- $Z or $Y -|- Z, as 
in a JMP or GO command. 

Pushing the first X -|- 1 registers onto the stack means essentially that we set 
S[t] ^ $0, S[t + 1] ^ $1, . . . , S[t + X] ^ $X, T ^ T + X + 1, $0 ^ $(X -P 1), 
. . . , $(L — X — 2) ^ $(L — 1), L ^ L — X — 1. For example, if X = 1 and L = 5, the 
current contents of $0 and the number 1 are placed on the register stack, where they 
will be temporarily inaccessible. Then control jumps to a subroutine with L reduced 
to 3; the registers that we had been calling $2, $3, and $4 appear as $0, $1, and $2 
to the subroutine. 

If L < X < G, the value of L increases to X -P 1 as described above; then the rules 
for X < L apply. 
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If X > G the actions are similar, except that all of the local registers $0, . . . , $(L— 1) 
are placed on the register stack followed by the number L, and L is reset to zero. In 
particular, the instruction PUSHGO $255,$Y,$Z pushes all the local registers onto the 
stack and sets L to zero, regardless of the previous value of L. 

We will see later that MMIX is able to achieve the effect of pushing and renaming 
local registers without actually doing very much work at all. 

• POP X,YZ ‘pop registers and return from subroutine’. 

This command preserves X of the current local registers, undoes the effect of the most 
recent PUSHJ or PUSHGO, and jumps to the instruction in M4[4YZ + rJ]. If X > 0, 
the value of $(X — 1) goes into the “hole” position where PUSHJ or PUSHGO stored the 
number of registers previously pushed. 

The formal details of POP are slightly complicated, but we will see that they make 
sense: If X > L, we first replace X by L + I. Then we set x -G- S'[r — I] mod 256; this 
is the effective value of the X field in the push instruction that is being undone. Stack 
position 5'[r— I] is now set to $(X— 1) if 0 < X < L, otherwise it is set to zero. Then 
we essentially set L min(a; + X, G), ${L — I) ^ $(L — x — 2), . . . , $(x + I) ^ $0, 
$x ^ S'[r — 1], . . . , $0 ^ 5'[r — x — I],t^t — x — I. The operating system should 
arrange things so that a memory-protection interrupt will occur if a program does 
more pops than pushes. (If x > G, these formulas don’t make sense as written; we 
actually set $j ^ — x — 1 -P j] for L > j > 0 in that rare case.) 

Suppose, for example, that a subroutine has three input parameters ($0, $1, $2) and 
produces two outputs ($0, $1). If the subroutine does not call any other subroutines, 
it can simply end with POP 2,0, because rJ will contain the return address. Otherwise 
it should begin by saving rJ, for example with the instruction GET $4,rJ if it will be 
using local registers $0 through $3, and it should use PUSHJ $5 or PUSHGO $5 when 
calling sub-subroutines; finally it should PUT rJ,$4 before saying POP 2,0. To call 
the subroutine from another routine that has, say, 6 local registers, we would put the 
input arguments into $7, $8, and $9, then issue the command PUSHGO $6 ,base , Subr; 
in due time the outputs of the subroutine will appear in $7 and $6. 

Notice that the push and pop commands make use of a one-place “hole” in the 
register stack, between the registers that are pushed down and the registers that 
remain local. (The hole is position $6 in the example just considered.) MMIX needs 
this hole position to remember the number of registers that are pushed down. A 
subroutine with no outputs ends with POP 0,0 and the hole disappears (becomes 
marginal) . A subroutine with one output $0 ends with POP 1 , 0 and the hole gets the 
former value of $0. A subroutine with two outputs ($0, $1) ends with POP 2,0 and 
the hole gets the former value of $1; in this case, therefore, the relative order of the 
two outputs has been switched on the register stack. If a subroutine has, say, five 
outputs ($0, . . . , $4), it ends with POP 5,0 and $4 goes into the hole position, where 
it is followed by ($0, $1, $2, $3). MMIX makes this curious permutation in the case of 
multiple outputs because the hole is most easily plugged by moving one value down 
(namely $4) instead of by sliding each of five values down in the stack. 
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These conventions for parameter passing are admittedly a bit confusing in the 
general case, and I suppose people who use them extensively might someday find 
themselves talking about “the infamous MMIX register shuffle.” However, there is 
good use for subroutines that convert a sequence of register contents like (x, a, b, c) 
into {f,a,b,c) where / is a function of a, b, and c but not x. Moreover, PUSHGO and 
POP can be implemented with great efficiency, and subroutine linkage tends to be a 
significant bottleneck when other conventions are used. 

Information about a subroutine’s calling conventions needs to be communicated to 
a debugger. That can readily be done at the same time as we inform the debugger 
about the symbolic names of addresses in memory. 

A subroutine that uses 50 local registers will not function properly if it is called by 
a program that sets G less than 50. MMIX does not allow the value of G to become less 
than 32. Therefore any subroutine that avoids global registers and uses at most 32 
local registers can be sure to work properly regardless of the current value of G. 

The rules stated above imply that a PUSHJ or PUSHGO instruction with X = 255 
pushes all of the currently defined local registers onto the stack and sets L to zero. 
This makes G local registers available for use by the subroutine jumped to. If that 
subroutine later returns with POP 0,0, the former value of L and the former contents 
of $0, . . . , $(L — 1) will be restored (assuming that G doesn’t decrease). 

A POP instruction with X = 255 preserves all the local registers as outputs of the 
subroutine (provided that the total doesn’t exceed G after popping), and puts zero 
into the hole (unless L = G = 255). The best policy, however, is almost always to use 
POP with a small value of X, and in general to keep the value of L as small as possible 
by decreasing it when registers are no longer active. A smaller value of L means that 
MMIX can change context more easily when switching from one process to another. 
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30. System considerations. High-performance implementations of MMIX gain 
speed by keeping caches of instructions and data that are likely to be needed as 
computation proceeds. [See M. V. Wilkes, IEEE Transactions EC-14 (1965), 270- 
271; J. S. Liptay, IBM System J. 7 (1968), 15-21.] Careful programmers can make 
the computer run even faster by giving hints about how to maintain such caches. 

• LDUNC $X,$Y,$Z|Z ‘load octa uncached’. 

These instructions, which have the same meaning as LDO, also inform the computer 
that the loaded octabyte (and its neighbors in a cache block) will probably not be 
read or written in the near future. 

• STUNC $X,$Y,$Z|Z ‘store octa uncached’. 

These instructions, which have the same meaning as STO, also inform the computer 
that the stored octabyte (and its neighbors in a cache block) will probably not be 
read or written in the near future. 

• PRELD X,$Y,$Z|Z ‘preload data’. 

These instructions have no effect on registers or memory, but they inform the com- 
puter that many of the X-|-l bytes M[$Y -|-$Z] through M[$Y -|-$Z-|-X], or M[$Y + Z] 
through M[$Y -|- Z -|-X], will probably be loaded and/or stored in the near future. No 
protection failure occurs if the memory is not accessible. 

• PREGO X,$Y,$Z|Z ‘prefetch to go’. 

These instructions have no effect on registers or memory, but they inform the com- 
puter that many of the X -P 1 bytes M[$Y -P $Z] through M[$Y -P $Z -P X] , or M[$Y -P Z] 
through M[$Y -P Z -P X], will probably be used as instructions in the near future. No 
protection failure occurs if the memory is not accessible. 

• PREST X,$Y,$Z|Z ‘prestore data’. 

These instructions have no effect on registers or memory if the computer has no data 
cache. But when such a cache exists, they inform the computer that all of the X -P 1 
bytes M[$Y -P $Z] through M[$Y -P $Z -P X], or M[$Y -P Z] through M[$Y -P Z -P X], 
will definitely be stored in the near future before they are loaded. (Therefore it 
is permissible for the machine to ignore the present contents of those bytes. Also, if 
those bytes are being shared by several processors, the current processor should try to 
acquire exclusive access.) No protection failure occurs if the memory is not accessible. 
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• SYNCD X,$Y,$Z|Z ‘synchronize data’. 

When executed from nonnegative locations, these instructions have no effect on 
registers or memory if neither a write buffer nor a “write back” data cache are present. 
But when such a buffer or cache exists, they force the computer to make sure that all 
data for the X + 1 bytes M[$Y + $Z] through M[$Y + $Z + X], or M[$Y + Z] through 
M[$Y + Z + X], will be present in memory. (Otherwise the result of a previous store 
instruction might appear only in the cache; the computer is being told that now is 
the time to write the information back, if it hasn’t already been written. A program 
can use this feature before outputting directly from memory.) No protection failure 
occurs if the memory is not accessible. 

The action is similar when SYNCD is executed from a negative address, but in this 
case the specified bytes are also removed from the data cache (and from a secondary 
cache, if present). The operating system can use this feature when a page of virtual 
memory is being swapped out, or when data is input directly into memory. 

• SYNCID X,$Y,$Z|Z ‘synchronize instructions and data’. 

When executed from nonnegative locations these instructions have no effect on regis- 
ters or memory if the computer has no instruction cache separate from a data cache. 
But when such a cache exists, they force the computer to make sure that the X -|- 1 
bytes M [$ Y -I- $ Z] through M[$Y-|-$Z-|-X], orM[$Y-|-Z] through M [$ Y -|- Z -|- X] , will be 
interpreted correctly if used as instructions before they are next modified. (Generally 
speaking, an MMIX program is not expected to store anything in memory locations 
that are also being used as instructions. Therefore MMIX’s instruction cache is al- 
lowed to become inconsistent with respect to its data cache. Programmers who insist 
on executing instructions that have been fabricated dynamically, for example when 
setting a breakpoint for debugging, must first SYNCID those instructions in order to 
guarantee that the intended results will be obtained.) A SYNCID command might be 
implemented in several ways; for example, the machine might update its instruction 
cache to agree with its data cache. A simpler solution, which is good enough because 
the need for SYNCID ought to be rare, removes instructions in the specified range from 
the instruction cache, if present, so that they will have to be fetched from memory 
the next time they are needed; in this case the machine also carries out the effect of 
a SYNCD command. No protection failure occurs if the memory is not accessible. 

The behavior is more drastic, but faster, when SYNCID is executed from a negative 
location. Then all bytes in the specified range are simply removed from all caches, 
and the memory corresponding to any “dirty” cache blocks involving such bytes is 
not brought up to date. An operating system can use this version of the command 
when pages of virtual memory are being discarded (for example, when a program is 
being terminated). 

31. MMIX is designed to work not only on a single processor but also in situations 
where several processors share a common memory. The following commands are useful 
for efficient operation in such circumstances. 

• CSWAP $X,$Y,$Z|Z ‘compare and swap octabytes’. 

If the octabyte Mg[$Y -|- $Z] or Mg[$Y -|- Z] is equal to the contents of the special 
prediction register rP, it is replaced in memory with the contents of register X, and 
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register X is set equal to 1. Otherwise the octabyte in memory replaces rP and 
register X is set to zero. This is an atomic (indivisible, uninterruptible) operation, 
useful for interprocess communication when independent computers are sharing the 
same memory. 

The compare-and-swap operation was introduced by IBM in late models of the 
System/370 architecture, and it soon spread to several other machines. Signifi- 
cant ways to use it are discussed, for example, in section 7.2.3 of Harold Stone’s 
High-Performance Computer Architecture (Reading, Massachusetts: Addison- Wesley, 
1987), and in sections 8.2 and 8.3 of Transaction Processing by Jim Gray and Andreas 
Reuter (San Francisco: Morgan Kaufmann, 1993). 

• SYNC XYZ ‘synchronize’. 

If XYZ = 0, the machine drains its pipeline (that is, it stalls until all preceding 
instructions have completed their activity). If XYZ = 1, the machine controls its 
actions less drastically, in such a way that all store instructions preceding this SYNC 
will be completed before all store instructions after it. If XYZ = 2, the machine 
controls its actions in such a way that all load instructions preceding this SYNC will 
be completed before all load instructions after it. If XYZ = 3, the machine controls 
its actions in such a way that all load or store instructions preceding this SYNC will 
be completed before all load or store instructions after it. If XYZ = 4, the machine 
goes into a power-saver mode, in which instructions may be executed more slowly 
(or not at all) until some kind of “wake-up” signal is received. If XYZ = 5, the 
machine empties its write buffer and cleans its data caches, if any (including a possible 
secondary cache); the caches retain their data, but the cache contents also appear in 
memory. If XYZ = 6, the machine clears its virtual address translation caches (see 
below). If XYZ = 7, the machine clears its instruction and data caches, discarding 
any information in the data caches that wasn’t previously in memory. ( “Clearing” is 
stronger than “cleaning”; a clear cache remembers nothing. Clearing is also faster, 
because it simply obliterates everything.) If XYZ > 7, an illegal instruction interrupt 
occurs. 

Of course no SYNC is necessary between a command that loads from or stores 
into memory and a subsequent command that loads from or stores into exactly the 
same location. However, SYNC might be necessary in certain cases even on a one- 
processor system, because input/output processes take place in parallel with ordinary 
computation. 

The cases XYZ > 3 are privileged, in the sense that only the operating system 
can use them. More precisely, if a SYNC command is encountered with XYZ = 4 or 
XYZ = 5 or XYZ = 6 or XYZ = 7, a “privileged instruction interrupt” occurs unless 
that interrupt is currently disabled. Only the operating system can disable interrupts 
(see below). 
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32. Trips and traps. Special register rA records the current status information 
about arithmetic exceptions. Its least significant byte contains eight “event” bits 
called DVWIOUZX from left to right, where D stands for integer divide check, V for 
integer overflow, W for float-to-fix overflow, I for invalid operation, O for floating 
overflow, U for floating underflow, Z for floating division by zero, and X for floating 
inexact. The next least significant byte of rA contains eight “enable” bits with the 
same names DVWIOUZX and the same meanings. When an exceptional condition 
occurs, there are two cases: If the corresponding enable bit is 0, the corresponding 
event bit is set to 1. But if the corresponding enable bit is 1, MMIX interrupts its 
current instruction stream and executes a special “exception handler.” Thus, the 
event bits record exceptions that have not been “tripped.” 

Floating point overflow always causes two exceptions, O and X. (The strictest 
interpretation of the IEEE standard would raise exception X on overflow only if 
floating overflow is not enabled, but MMIX always considers an overflowed result to be 
inexact.) Floating point underflow always causes both U and X when underflow is not 
enabled, and it might cause both U and X when underflow is enabled. If both enable 
bits are set to 1 in such cases, the overflow or underflow handler is called and the 
inexact handler is ignored. All other types of exceptions arise one at a time, so there 
is no ambiguity about which exception handler should be invoked unless exceptions 
are raised by “ropcode 2” (see below); in general the first enabled exception in the 
list DVWIOUZX takes precedence. 

What about the six high-order bytes of the status register rA? At present, only 
two of those 48 bits are defined; the others must be zero for compatibility with 
possible future extensions. The two bits corresponding to 2^^ and 2^® in rA specify a 
rounding mode, as follows: 00 means round to nearest (the default); 01 means round 
off (toward zero); 10 means round up (toward positive infinity); and 11 means round 
down (toward negative infinity). 

33. The execution of MMIX programs can be interrupted in several ways. We have 
just seen that arithmetic exceptions will cause interrupts if they are enabled; so 
will illegal or privileged instructions, or instructions that are emulated in software 
instead of provided by the hardware. Input / output operations or external timers are 
another common source of interrupts; the operating system knows how to deal with 
all gadgets that might be hooked up to an MMIX processor chip. Interrupts occur 
also when memory accesses fail — for example if memory is nonexistent or protected. 
Power failures that force the machine to use its backup battery power in order to keep 
running in an emergency, or hardware failures like parity errors, all must be handled 
as gracefully as possible. 

Users can also force interrupts to happen by giving explicit TRAP or TRIP instruc- 
tions: 

• TRAP X,Y,Z ‘trap’; TRIP X,Y,Z ‘trip’. 

Both of these instructions interrupt processing and transfer control to a handler. The 
difference between them is that TRAP is handled by the operating system but TRIP 
is handled by the user. More precisely, the X, Y, and Z fields of TRAP have special 
significance predefined by the operating system kernel. For example, a system call — 
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say an I/O command, or a command to allocate more memory — might be invoked by 
certain settings of X, Y, and Z. The X, Y, and Z fields of TRIP, on the other hand, 
are definable by users for their own applications, and users also define their own 
handlers. “Trip handler” programs invoked by TRIP are interruptible, but interrupts 
are normally inhibited while a TRAP is being serviced. Specific details about the 
precise actions of TRIP and TRAP appear below, together with the description of 
another command called RESUME that returns control from a handler to the interrupted 
program. 

Only two variants of TRAP are predefined by the MMIX architecture: If XYZ = 0 in 
a TRAP command, a user process should terminate. If XYZ = 1, the operating system 
should provide default action for cases in which the user has not provided any handler 
for a particular kind of interrupt (see below). 

A few additional variants of TRAP are predefined in the rudimentary operating 
system used with MMIX simulators. These variants, which allow simple input/output 
operations to be done, all have X = 0, and the Y field is a small positive constant. 
For example, Y = I invokes the Fopen routine, which opens a file. (See the program 
MMIX-SIM for full details.) 
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34. Non-catastrophic interrupts in MMIX are always precise, in the sense that all legal 
instructions before a certain point have effectively been executed, and no instructions 
after that point have yet been executed. The current instruction, which may or may 
not have been completed at the time of interrupt and which may or may not need to 
be resumed after the interrupt has been serviced, is put into the special execution 
register rX, and its operands (if any) are placed in special registers rY and rZ. 
The address of the following instruction is placed in the special where-interrupted 
register rW. The instruction in rX might not be the same as the instruction in 
location rW — 4; for example, it might be an instruction that branched or jumped 
to rW. It might also be an instruction inserted internally by the MMIX processor. 
(For example, the computer silently inserts an internal instruction that increases L 
before an instruction like ADD $9,$1,$0 if L is currently less than 10. If an interrupt 
occurs, between the inserted instruction and the ADD, the instruction in rX will say 
ADD, because an internal instruction retains the identity of the actual command that 
spawned it; but rW will point to the real ADD command.) 

When an instruction has the normal meaning “set $X to the result of $Y op $Z” 
or “set $X to the result of $Y op Z,” special registers rY and rZ will relate in the 
obvious way to the Y and Z operands of the instruction; but this is not always the 
case. For example, after an interrupted store instruction, the first operand rY will 
hold the virtual memory address ($Y plus either $Z or Z), and the second operand rZ 
will be the octabyte to be stored in memory (including bytes that have not changed, 
in cases like STB). In other cases the actual contents of rY and rZ are defined by each 
implementation of MMIX, and programmers should not rely on their significance. 

Some instructions take an unpredictable and possibly long amount of time, so it may 
be necessary to interrupt them in progress. For example, the FREM instruction (floating 
point remainder) is extremely difficult to compute rapidly if its first operand has an 
exponent of 2046 and its second operand has an exponent of I. In such cases the rY 
and rZ registers saved during an interrupt show the current state of the computation, 
not necessarily the original values of the operands. The value of rY rem rZ will still 
be the desired remainder, but rY may well have been reduced to a number that has 
an exponent closer to the exponent of rZ. After the interrupt has been processed, the 
remainder computation will continue where it left off. (Alternatively, an operation 
like FREM or even FADD might be implemented in software instead of hardware, as we 
will see later.) 

Another example arises with an instruction like PREST (prestore), which can specify 
prestoring up to 256 bytes. An implementation of MMIX might choose to prestore only 
32 or 64 bytes at a time, depending on the cache block size; then it can change the 
contents of rX to reflect the unfinished part of a partially completed PREST command. 

Commands that decrease G, pop the stack, save the current context, or unsave an 
old context also are interruptible. Register rX is used to communicate information 
about partial completion in such a way that the interruption will be essentially 
“invisible” after a program is resumed. 
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35. Three kinds of interruption are possible: trips, forced traps, and dynamic traps. 
We will discuss each of these in turn. 

A TRIP instruction puts itself into the right half of the execution register rX, and 
sets the 32 bits of the left half to *80000000. (Therefore rX is negative-, this fact will 
tell the RESUME command not to TRIP again.) The special registers rY and rZ are set 
to the contents of the registers specified by the Y and Z fields of the TRIP command, 
namely $Y and $Z. Then $255 is placed into the special bootstrap register rB, and 
$255 is set to rJ. MMIX now takes its next instruction from virtual memory address 0. 

Arithmetic exceptions interrupt the computation in essentially the same way as 
TRIP, if they are enabled. The only difference is that their handlers begin at the 
respective addresses 16, 32, 48, 64, 80, 96, 112, and 128, for exception bits D, V, W, 
I, O, U, Z, and X of rA; registers rY and rZ are set to the operands of the interrupted 
instruction as explained earlier. 

A 16-byte block of memory is just enough for a sequence of commands like 

PUSH! 255, Handler; PUT rJ,$255; GET $255, rB; RESUME 

which will invoke a user’s handler. And if the user does not choose to provide a 
custom-designed handler, the operating system provides a default handler via the 
instructions 

TRAP 1; GET $255, rB; RESUME. 

A trip handler might simply record the fact that tripping occurred. But the handler 
for an arithmetic interrupt might want to change the default result of a computation. 
In such cases, the handler should place the desired substitute result into rZ, and it 
should change the most signihcant byte of rX from *80 to *02. This will have the 
desired effect, because of the rules of RESUME explained below, unless the exception 
occurred on a command like STB or STSF. (A bit more work is needed to alter the 
effect of a command that stores into memory.) 

Instructions in negative virtual locations do not invoke trip handlers, either for 
TRIP or for arithmetic exceptions. Such instructions are reserved for the operating 
system, as we will see. 
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36. A TRAP instruction interrupts the computation essentially like TRIP, but with 
the following modifications: (i) the interrupt mask register rK is cleared to zero, 
thereby inhibiting interrupts; (ii) control jumps to virtual memory address rT, not 
zero; (iii) information is placed in a separate set of special registers rBB, rWW, rXX, 
rYY, and rZZ, instead of rB, rW, rX, rY, and rZ. (These special registers are needed 
because a trap might occur while processing a TRIP.) 

Another kind of forced trap occurs on implementations of MMIX that emulate certain 
instructions in software rather than in hardware. Such instructions cause a TRAP even 
though their opcode is something else like FREM or FADD or DIV. The trap handler 
can tell what instruction to emulate by looking at the opcode, which appears in rXX. 
In such cases the left-hand half of rXX is set to *02000000; the handler emulating 
FADD, say, should compute the floating point sum of rYY and rZZ and place the result 
in rZZ. A subsequent RESUME 1 will then place the value of rZZ in the proper register. 

When a forced trap occurs on a store instruction because of memory protection 
failure, the settings of rYY and rZZ are undefined. They do not necessarily correspond 
to the virtual address rY and the octabyte to be stored rZ that are supplied to a trip 
handler after a tripped store instruction, because a forced trap aborts its instruction 
as soon as possible. 

Implementations of MMIX might also emulate the process of virtual-address-to- 
physical-address translation described below, instead of providing for page table 
calculations in hardware. Then if, say, a LDB instruction does not know the physical 
memory address corresponding to a specified virtual address, it will cause a forced 
trap with the left half of rXX set to *03000000 and with rYY set to the virtual address 
in question. The trap handler should place the physical page address into rZZ; then 
RESUME 1 will complete the LDB. 

37. The third and final kind of interrupt is called a dynamic trap. Such interruptions 
occur when one or more of the 64 bits in the special interrupt request register rQ have 
been set to 1, and when at least one corresponding bit of the special interrupt mask 
register rK is also equal to 1. The bit positions of rQ and rK have the general form 
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low-priority I/O 


program 


high-priority I/O 


machine 



where the 8-bit “program” bits are called rwxnkbsp and have the following meanings: 

r bit: instruction tries to load from a page without read permission; 
w bit: instruction tries to store to a page without write permission; 

X bit: instruction appears in a page without execute permission; 
n bit: instruction refers to a negative virtual address; 
k bit: instruction is privileged, for use by the “kernel” only; 
b bit: instruction breaks the rules of MMIX; 
s bit: instruction violates security (see below); 

p bit: instruction comes from a privileged (negative) virtual address. 

Negative addresses are for the use of the operating system only; a security violation 
occurs if an instruction in a nonnegative address is executed without the rwxnkbsp 
bits of rK all set to 1. (In such cases the s bits of both rQ and rK are set to I.) 
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The eight “machine” bits of rQ and rK represent the most urgent kinds of interrupts. 
The rightmost bit stands for power failure, the next for memory parity error, the next 
for nonexistent memory, the next for rebooting, etc. Interrupts that need especially 
quick service, like requests from a high-speed network, also are allocated bit positions 
near the right end. Low priority I/O devices like keyboards are assigned to bits 
at the left. The allocation of input/output devices to bit positions will differ from 
implementation to implementation, depending on what devices are available. 

Once rQ A rK becomes nonzero, the machine waits briefly until it can give a precise 
interrupt. Then it proceeds as with a forced trap, except that it uses the special 
“dynamic trap address register” rTT instead of rT. The trap handler that begins at 
location rTT can figure out the reason for interrupt by examining rQ A rK. (For 
example, after the instructions 

GET $0,rQ; LDDU $l,savedK; AND $0,$0,$1; SUBU $1,$0,1; 

SADD ANDN 

the highest- priority offending bit will be in $1 and its position will be in $2.) 

If the interrupted instruction contributed Is to any of the rwxnkbsp bits of rQ, the 
corresponding bits are set to 1 also in rXX. A dynamic trap handler might be able to 
use this information (although it should service higher-priority interrupts first if the 
right half of rQ A rK is nonzero). 

The rules of MMIX are rigged so that only the operating system can execute in- 
structions with interrupts suppressed. Therefore the operating system can in fact use 
instructions that would interrupt an ordinary program. Control of register rK turns 
out to be the ultimate privilege, and in a sense the only important one. 

An instruction that causes a dynamic trap is usually executed before the interrup- 
tion occurs. However, an instruction that traps with bits x, k, or b does nothing; a 
load instruction that traps with r or n loads zero; a store instruction that traps with 
any of rwxnkbsp stores nothing. 
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38. After a trip handler or trap handler has done its thing, it generally invokes the 
following command. 

• RESUME Z ‘resume after interrupt’; the X and Y fields must be zero. 

If the Z field of this instruction is zero, MMIX will use the information found in special 
registers rW, rX, rY, and rZ to restart an interrupted computation. If the execution 
register rX is negative, it will be ignored and instructions will be executed starting at 
virtual address rW ; otherwise the instruction in the right half of the execution register 
will be inserted into the program as if it had appeared in location rW — 4, subject to 
certain modifications that we will explain momentarily, and the next instruction will 
come from rW. 

If the Z field of RESUME is 1 and if this instruction appears in a negative location, 
registers rWW, rXX, rYY, and rZZ are used instead of rW, rX, rY, and rZ. Also, 
just before resuming the computation, mask register rK is set to $255 and $255 is set 
to rBB. (Only the operating system gets to use this feature.) 

An interrupt handler within the operating system might choose to allow itself to 
be interrupted. In such cases it should save the contents of rBB, rWW, rXX, rYY, 
and rZZ on some kind of stack, before making rK nonzero. Then, before resuming 
whatever caused the base level interrupt, it must again disable all interrupts; this 
can be done with TRAP, because the trap handler can tell from the virtual address 
in rWW that it has been invoked by the operating system. Once rK is again zero, 
the contents of rBB, rWW, rXX, rYY, and rZZ are restored from the stack, the outer 
level interrupt mask is placed in $255, and RESUME 1 finishes the job. 

Values of Z greater than 1 are reserved for possible later definition. Therefore they 
cause an illegal instruction interrupt (that is, they set the ‘b’ bit of rQ) in the present 
version of MMIX. 

If the execution register rX is nonnegative, its leftmost byte controls the way its 
right-hand half will be inserted into the program. Let’s call this byte the “ropcode.” 
A ropcode of 0 simply inserts the instruction into the execution stream; a ropcode 
of 1 is similar, but it substitutes rY and rZ for the two operands, assuming that this 
makes sense for the operation considered. 

Ropcode 2 inserts a command that sets $X to rZ, where X is the second byte in the 
right half of rX. This ropcode is normally used with forced-trap emulations, so that 
the result of an emulated instruction is placed into the correct register. It also uses the 
third-from-left byte of rX to raise any or all of the arithmetic exceptions DVWIOUZX, 
at the same time as rZ is being placed in $X. Emulated instructions and explicit TRAP 
commands can therefore cause overflow, say, just as ordinary instructions can. (Such 
new exceptions may, of course, spawn a trip interrupt, if any of the corresponding 
bits are enabled in rA.) 

Finally, ropcode 3 is the same as ropcode 0, except that it also tells MMIX to treat 
rZ as the page table entry for the virtual address rY. (See the discussion of virtual 
address translation below.) Ropcodes greater than 3 are not permitted; moreover, 
only RESUME 1 is allowed to use ropcode 3. 
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The ropcode rules in the previous paragraphs should of course be understood to 
involve rWW, rXX, rYY, and rZZ instead of rW, rX, rY, and rZ when the ropcode 
is seen by RESUME 1. Thus, in particular, ropcode 3 always applies to rYY and rZZ, 
never to rY and rZ. 

Special restrictions must hold if resumption is to work properly: Ropcodes 0 and 3 
must not insert a RESUME instruction; ropcode 1 must insert a “normal” instruction, 
namely one whose opcode begins with one of the hexadecimal digits *0, *1, *2, *3, 
*6, *7, *C, *D, or *E. (See the opcode chart below.) Some implementations may 
also allow ropcode 1 with SYNOD [I] and SYNCIDfl], so that those instructions can 
conveniently be interrupted. Moreover, the destination register $X used with ropcode 
1 or 2 must not be marginal. All of these restrictions hold automatically in normal 
use; they are relevant only if the programmer tries to do something tricky. 

Notice that the slightly tricky sequence 

EDA $0,Loc; PUT rW,$0; LDTU $l,Inst; PUT rX,$l; RESUME 

will execute an almost arbitrary instruction Inst as if it had been in location Loc-4, 
and then will jump to location Loc (assuming that Inst doesn’t branch elsewhere). 
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39. Special registers. Quite a few special registers have been mentioned so far, 
and MMIX actually has even more. It is time now to enumerate them all, together 
with their internal code numbers: 

rA, arithmetic status register [21]; 
rB, bootstrap register (trip) [0]; 
rC, continuation register [8]; 
rD, dividend register [Ij; 
rE, epsilon register [2]; 
rF, failure location register [22]; 
rG, global threshold register [19]; 

I'H, himult register [3]; 
rl, interval counter [12]; 
rJ, return-jump register [4]; 
rK, interrupt mask register [15]; 
rL, local threshold register [20]; 
rM, multiplex mask register [5]; 
rN, serial number [9]; 
rO, register stack offset [10]; 
rP, prediction register [23]; 
rQ, interrupt request register [16]; 
rR, remainder register [6]; 
rS, register stack pointer [11]; 
rT, trap address register [13]; 
rlJ, usage counter [17]; 
rV, virtual translation register [18]; 
rW, where-interrupted register (trip) [24]; 
rX, execution register (trip) [25]; 
rY, Y operand (trip) [26]; 
rZ, Z operand (trip) [27]; 
rBB, bootstrap register (trap) [7]; 
rTT, dynamic trap address register [14]; 
rWW, where-interrupted register (trap) [28]; 
rXX, execution register (trap) [29]; 
rYY, Y operand (trap) [30]; 
rZZ, Z operand (trap) [31]; 

In this list rG and rL are what we have been calling simply G and L; rG, rF, rl, rN, 
rO, rS, rlJ, and rV have not been mentioned before. 

40. The interval counter rl decreases by 1 on every “clock pulse” of the MMIX pro- 
cessor. Thus if MMIX is running at 500 MHz, the interval counter decreases every 2 
nanoseconds. It causes an interval interrupt when it reaches zero. Such interrupts 
can be extremely useful for “continuous profiling” as a means of studying the em- 
pirical running time of programs; see Jennifer M. Anderson, Lance M. Berc, Jeffrey 
Dean, Sanjay Ghemawat, Monika R. Henzinger, Shun-Tak A. Leung, Richard L. Sites, 
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Mark T. Vandevoorde, Carl A. Waldspurger, and William E. Weihl, ACM Transac- 
tions on Computer Systems 15 (1997), 357-390. The interval interrupt is achieved 
by setting the next-to- leftmost bit of the “machine” byte of rQ equal to 1; this is the 
seventh-least-significant bit. 

The usage counter rlJ consists of three fields {up,Um,Uc), called the usage pat- 
tern Up, the usage mask Um, and the usage count Uc- The most significant byte of rlJ 
is the usage pattern; the next most significant byte is the usage mask; and the re- 
maining 48 bits are the usage count. Whenever an instruction whose OP A Um = Up 
has been executed, the value of Uc increases by 1 (modulo 2^^). Thus, for example, 
the OP-code chart below implies that all instructions are counted if Up = Um = 0; 
all loads and stores are counted together with GO and PUSHGD if Up = (10000000)2 
and Um = (11000000)2; all floating point instructions are counted together with fixed 
point multiplications and divisions if Up = 0 and Um = (11100000)2; fixed point multi- 
plications and divisions alone are counted if Up = (00011000)2 and Um = (11111000)2; 
completed subroutine calls are counted if Up = POP and Um = (11111111)2- Instruc- 
tions in negative locations, which belong to the operating system, are exceptional: 
They are included in the usage count only if the leading bit of itc is 1. 

Incidentally, the 64-bit counter rl can be implemented rather cheaply with only 
two levels of logic, using an old trick called “carry-save addition” [see, for example, 
G. Metze and J. E. Robertson, Proc. International Conf. Information Processing 
(Paris: 1959), 389-396]. One nice embodiment of this idea is to represent a binary 
number x in a redundant form as the difference x' — x” of two binary numbers. Any 
two such numbers can be added without carry propagation as follows: Let 

f{x, y, z) = {x Ay)\/ {x A z)\/ {y A z), g{x, y,z) = x®y® z. 

Then it is easy to check that x — y z = 2f{x, y, z) — g{x, y, z); we need only verify 
this in the eight cases when x, y, and z are 0 or 1. Thus we can subtract 1 from a 
counter x' — x" by setting 

{x',x") A- (/(x',x",-l) « 1, g{x' ,x" 

we can add 1 by setting {x' ,x") •<— {g{x" ,x' ,—l), f{x" ,x' ,—T) <C 1). The result is 
zero if and only if x' = x" . We need not actually compute the difference x' — x" 
until we need to examine the register. The computation of f{x,y,z) and g{x,y,z) 
is particularly simple in the special cases z = 0 and z = — 1. A similar trick works 
for rU, but extra care is needed in that case because several instructions might finish 
at the same time. (Thanks to Frank Yellin for his improvements to this paragraph.) 
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41 . The special serial number register rN is permanently set to the time this 
particular instance of MMIX was created (measured as the number of seconds since 
00:00:00 Greenwich Mean Time on 1 January 1970), in its five least significant bytes. 
The three most significant bytes are permanently set to the version number of the 
MMIX architecture that is being implemented together with two additional bytes that 
modify the version number. This quantity serves as an essentially unique identification 
number for each copy of MMIX. 

Version 1.0.0 of the architecture is described in the present document. Version 1.0.1 
is similar, but simplified to avoid the complications of pipelines and operating systems. 
Other versions may become necessary in the future. 

42 . The register stack offset rO and register stack pointer rS are especially inter- 
esting, because they are used to implement MMIX’s register stack ^[O], S')!], S[2], 

The operating system initializes a register stack by assigning a large area of virtual 
memory to each running process, beginning at an address like *6000000000000000. If 
this starting address is a, stack entry S[k] will go into the octabyte M8[cr-|-8fc]. Stack 
underflow will be detected because the process does not have permission to read from 
M[(t — 1]. Stack overflow will be detected because something will give out — either the 
user’s budget or the user’s patience or the user’s swap space — long before 2®^ bytes 
of virtual memory are filled by a register stack. 

The MMIX hardware maintains the register stack by having two banks of 64-bit 
general-purpose registers, one for globals and one for locals. The global registers 
g[32], g[33], . . . , g[255] are used for register numbers that are > G in MMIX commands; 
recall that G is always 32 or more. The local registers come from another array that 
contains 2” registers for some n where 8 < n < 10; for simplicity of exposition we will 
assume that there are exactly 512 local registers, but there may be only 256 or there 
may be 1024. 

The local register slots 1[0], 1[1], . . . , 1[511] act as a cyclic buffer with addresses that 
wrap around mod 512, so that 1[512] = 1[0], 1[513] = 1[1], etc. This buffer is divided 
into three parts by three pointers, which we will call a, /3, and 7. 




Registers l[a], l[a -I- 1], . . . , l[/3 — 1] are what program instructions currently call $0, 
$1, ..., $(L— 1); registers l[/3], l[/3-|- 1], . . . , 1)7 — 1] are currently unused; and registers 
l[7]i 1[7 + . • . , 1 [q! — 1 ] contain items of the register stack that have been pushed 

down but not yet stored in memory. Special register rS holds the virtual memory 
address where 1)7] will be stored, if necessary. Special register rO holds the address 
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where l[o!] will be stored; this always equals 8r plus the address of 5'[0]. We can 
deduce the values of a, j3, and 7 from the contents of rL, rO, and rS, because 

a = (rO/8) mod 512, /3 = (a + rL) mod 512, and 7 = (rS/8) mod 512. 

To maintain this situation we need to make sure that the pointers a, /3, and 7 never 
move past each other. A PUSH! or PUSHGO operation simply advances a toward /3, so 
it is very simple. The first part of a POP operation, which moves j3 toward a, is also 
very simple. But the next part of a POP requires a to move downward, and memory 
accesses might be required. MMIX will decrease rS by 8 (thereby decreasing 7 by 1) 
and set 1)7] ^ Mg[rS], one or more times if necessary, to keep a from decreasing 
past 7. Similarly, the operation of increasing L may cause MMIX to set Mg[rS] ^ 1)7] 
and increase rS by 8 (thereby increasing 7 by 1) one or more times, to keep (3 from 
increasing past 7. (Actually /? is never allowed to increase to the point where it 
becomes equal to 7.) If many registers need to be loaded or stored at once, these 
operations are interruptible. 

[A somewhat similar scheme was introduced by David R. Ditzel and H. R. McLellan 
in SIGPLAN Notices 17,4 (April 1982), 48-56, and incorporated in the so-called 
CRISP architecture developed at AT&T Bell Labs. An even more similar scheme 
was adopted in the late 1980s by Advanced Micro Devices, in the processors of their 
Am29000 series — a family of computers whose instructions have essentially the format 
‘OP X Y Z’ used by MMIX.] 

Limited versions of MMIX, having fewer registers, can also be envisioned. For 
example, we might have only 32 local registers 1[0], 1[1], . . . , 1[31] and only 32 global 
registers g[224], g[225], . . . , g[255]. Such a machine could run any MMIX program that 
maintains the inequalities L < 32 and G > 224. 
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43. Access to MMIX’s special registers is obtained via the GET and PUT commands. 

• GET $X,Z ‘get from special register’; the Y held must be zero. 

Register X is set to the contents of the special register identihed by its code number Z, 
using the code numbers listed earlier. An illegal instruction interrupt occurs if Z > 32. 

Every special register is readable; MMIX does not keep secrets from an inquisitive 
user. But of course only the operating system is allowed to change registers like rK 
and rQ (the interrupt mask and request registers) . And not even the operating system 
is allowed to change rN (the serial number) or the stack pointers rO and rS. 

• PUT X,$Z|Z ‘put into special register’; the Y held must be zero. 

The special register identihed by X is set to the contents of register Z or to the 
unsigned byte Z itself, if permissible. Some changes are, however, impermissible: Bits 
of rA that are always zero must remain zero; the leading seven bytes of rG and rL 
must remain zero, and rL must not exceed rG; special registers 9-11 (namely rN, rO, 
and rS) must not change; special registers 8 and 12-18 (namely rC, rl, rK, rQ, rT, rlJ, 
rV, and rTT) can be changed only if the privilege bit of rK is zero; and certain bits 
of rQ (depending on available hardware) might not allow software to change them 
from 0 to 1. Moreover, any bits of rQ that have changed from 0 to 1 since the most 
recent GET x,rQ will remain 1 after PUT rQ,z. The PUT command will not increase rL; 
it sets rL to the minimum of the current value and the new value. (A program should 
say SETL $99,0 instead of PUT rL, 100 when rL is known to be less than 100.) 

Impermissible PUT commands cause an illegal instruction interrupt, or (in the case 
of rC, rl, rK, rQ, rT, rU, rV, and rTT) a privileged operation interrupt. 

• SAVE $X,0 ‘save process state’; UNSAVE 0,$Z ‘restore process state’; the Y field 
must be 0, and so must the Z field of SAVE, the X field of UNSAVE. 

The SAVE instruction stores all registers and special registers that might affect the 
computation of the currently running process. First the current local registers $0, $1, 
. . . , ${L — 1) are pushed down as in PUSHGO $255, and L is set to zero. Then the 
current global registers $G, $(G + 1), . . . , $255 are placed above them in the register 
stack; finally rB, rD, rE, rH, rJ, rM, rR, rP, rW, rX, rY, and rZ are placed at the 
very top, followed by registers rG and rA packed into eight bytes: 



24 



32 



rG 



0 



rA 



The address of the topmost octabyte is then placed in register X, which must be a 
global register. (This instruction is interruptible. If an interrupt occurs while the 
registers are being saved, we will have a = /3 = 7 in the ring of local registers; thus 
rO will equal rS and rL will be zero. The interrupt handler essentially has a new 
register stack, starting on top of the partially saved context.) Immediately after a 
SAVE the values of rO and rS are equal to the location of the first byte following the 
stack just saved. The current register stack is effectively empty at this point; thus 
one shouldn’t do a POP until this context or some other context has been unsaved. 
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The UNSAVE instruction goes the other way, restoring all the registers when given 
an address in register Z that was returned by a previous SAVE. Immediately after 
an UNSAVE the values of rO and rS will be equal. Like SAVE, this instruction is 
interruptible. 

The operating system uses SAVE and UNSAVE to switch context between different 
processes. It can also use UNSAVE to establish suitable initial values of rO and rS. But 
a user program that knows what it is doing can in fact allocate its own register stack 
or stacks and do its own process switching. 

Caution; UNSAVE is destructive, in the sense that a program can’t reliably UNSAVE 
twice from the same saved context. Once an UNSAVE has been done, further operations 
are likely to change the memory record of what was saved. Moreover, an interrupt 
during the middle of an UNSAVE may have already clobbered some of the data in 
memory before the UNSAVE has completely finished, although the data will appear 
properly in all registers. 
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44. Virtual and physical addresses. Virtual 64-bit addresses are converted to 
physical addresses in a manner governed by the special virtual translation register rV. 
Thus M[A] really refers to m[</)(A)], where m is the physical memory array and </>(A) 
is determined by the physical mapping function (j>. The details of this conversion are 
rather technical and of interest mainly to the operating system, but two simple rules 
are important to ordinary users: 

• Negative addresses are mapped directly to physical addresses, by simply suppressing 
the sign bit: 

cj){A) = A -h = A A #7f f f f f f f f f f f f f f f , if A < 0. 

All accesses to negative addresses are privileged, for use by the operating system only. 
(Thus, for example, the trap addresses in rT and rTT should be negative, because they 
are addresses inside the operating system.) Moreover, all physical addresses > 2^® 
are intended for use by memory-mapped I/O devices; values read from or written to 
such locations are never placed in a cache. 

• Nonnegative addresses belong to four segments, depending on whether the three 
leading bits are 000, 001, 010, or 011. These 2®^-byte segments are traditionally used 
for a program’s text, data, dynamic memory, and register stack, respectively, but such 
conventions are not mandatory. There are four mappings 4>o, 4>i, (j) 2 , and <f >3 of 61-bit 
addresses into 48-bit physical memory space, one for each segment: 

(j){A) = (j)iA/20C mod 2®1), if 0 < A < 2®®. 

In general, the machine is able to access smaller addresses of a segment more efficiently 
than larger addresses. Thus a programmer should let each segment grow upward from 
zero, trying to keep any of the 61-bit addresses from becoming larger than necessary, 
although arbitrary addresses are legal. 

45. Now it’s time for the technical details of virtual address translation. The 
mappings (po, pi, p 2 , and ps are defined by the following rules. 

(1) The first two bytes of rV are four nybbles called bi, 62, 63, b^; we also define 

bo = 0. Segment i has at most 1024 pages. In particular, segment i must have 

at most one page when bi = 6i+i, and it must be entirely empty if bi > bi^i. 

(2) The next byte of rV, s, specifies the current page size, which is 2® bytes. We 
must have s > 13 (hence at least 8192 bytes per page). Values of s larger than, say, 
20 or so are of use only in rather large programs that will reside in main memory for 
long periods of time, because memory protection and swapping are applied to entire 
pages. The maximum legal value of s is 48. 

(3) The remaining five bytes of rV are a 27-bit root location r, a 10-bit address 
space number n, and a 3-bit function field /: 
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Normally / = 0; if / = 1, virtual address translation will be done by software instead 
of hardware, and the 61, 62 , bo, 64, and r fields of rV will be ignored by the hardware. 
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(Values of / > 1 are reserved for possible future use; if / > 1 when MMIX tries to 
translate an address, a memory-protection failure will occur.) 

(4) Each page has an 8- byte page table entry (PTE), which looks like this: 



PTE = 
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Here x and y are ignored (thus they are usable for any purpose by the operating 
system); 2® a is the physical address of byte 0 on the page; and n is the address space 
number (which must match the number in rV) . The final three bits are the protection 
hits PrPwPx', the user needs Pr = 1 to load from this page, Pu, = 1 to store on this 
page, and Px = t to execute instructions on this page. If n fails to match the number 
in rV, or if the appropriate protection bit is zero, a memory-protection fault occurs. 

Page table entries should be writable only by the operating system. The 16 ignored 
bits of X imply that physical memory size is limited to 2"^® bytes (namely 256 large 
terabytes); that should be enough capacity for awhile, if not for the entire new 
millennium. 

(5) A given 61-bit address A belongs to page L^/2®J of its segment, and 

4>i{A) = 2^ a + {A mod 2®) 

if a is the address in the PTE for page L^/2*J of segment i. 

(6) Suppose L^/2^J is equal to ( 0403020100)1024 in the radix-1024 number system. 
In the common case 04 = 03 = 02 = oi = 0, the PTE is simply the octabyte 
m8[2^®(r- -I- bi) -|- 8og]; this rule defines the mapping for the first 1024 pages. The next 
million or so pages are accessed through an auxiliary page table pointer 
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in m8[2^®(r-|-&i-|-l)-|-8ai]; here the sign must be 1 and the n-field must match rV, but 
the q bits are ignored. The desired PTE for page (oiOo)io 24 is then in m8[2^®c-|- 8 oq]. 
The next billion or so pages, namely the pages ( 020400)1024 with 02 ^ 0, are accessed 
similarly, through an auxiliary PTP at level two; and so on. 
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Notice that if 63 = 64, there is just one page in segment 3, and its PTE appears all 
alone in physical location 2^^(r + 63). Otherwise the PTEs appear in 1024-octabyte 
blocks. We usually have 0 < 61 < 62 < 63 < 64, but the null case 5i = 62 = 63 = 64 = 0 
is worthy of mention: In this special case there is only one page, and the segment bits 
of a virtual address are ignored; the other 61 — s bits of each virtual address must be 
zero. 

If s = 13, bi = 3, 62 = 2, 63 = 1, and 64 = 0, there are at most 2^° pages of 8192 
bytes each, all belonging to segment 0. This is essentially the virtual memory setup 
in the Alpha 21064 computers with DIGITAL UNIX"'"'^. 

Several special cases have weird behavior, which probably isn’t going to be useful. 
But I might as well mention them so that the flexibility of this scheme is clarified: If, 
for example, bi = 2, 62 = 63 = 1, and 64 = 5, then r + 1 is used both for PTPs of 
segment 0 and PTEs of segment 2. And if 62 = 63 < 64, then r + 62 is used for the 
PTE of page 0 segments 2 and 3; page 1 of segment 2 is not allowed, but there is a 
page 1 in segment 3. 

I know these rules look extremely complicated, and I sincerely wish I could have 
found an alternative that would be both simple and efficient in practice. I tried 
various schemes based on hashing, but came to the conclusion that “trie” methods 
such as those described here are better for this application. Indeed, the page tables in 
most contemporary computers are based on very similar ideas, but with significantly 
smaller virtual addresses and without the shortcut for small page numbers. I tried 
also to hnd formats for rV and the page tables that would match byte boundaries in 
a more friendly way, but the corresponding page sizes did not work well. Fortunately 
these grungy details are almost always completely hidden from ordinary users. 

Stack overflow presents a potential problem: If 7 increases to a virtual address on 
a new page for which there is no permission to write, the protection interrupt handler 
would have no stack space in which to work! Therefore MMIX has a continuation reg- 
ister rC, which contains the physical address of a “continuation page.” Pushed-down 
information is written to the continuation page until MMIX comes to an instruction 
that is safely interruptible. Then a stack overflow interrupt occurs, and the operating 
system can restore order. The format of rC is just like an ordinary PTE entry, except 
that the n field is ignored. 
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46. Of course MMIX can’t afford to perform a lengthy calculation of physical ad- 
dresses every time it accesses memory. The machine therefore maintains a translation 
cache (TC), which contains the translations of recently accessed pages. (In fact, there 
usually are two such caches, one for instructions and one for data.) A TC holds a set 
of 64-bit translation keys 
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associated with 38-bit translations 
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representing the relevant parts of the PTE for page v of segment i. Different pro- 
cesses typically have different values of n, and possibly also different values of s. The 
operating system needs a way to keep such caches up to date when pages are being al- 
located, moved, swapped, or recycled. The operating system also likes to know which 
pages have been recently used. The LDVTS instructions facilitate such operations: 

• LDVTS $X,$Y,$Z|Z ‘load virtual translation status’. 

The sum $Y -|- $Z or $Y -|- Z should have the form of a translation cache key as above, 
except that the rightmost three bits need not be zero. If this key is present in a TC, 
the rightmost three bits replace the current protection code p; however, if p is thereby 
set to zero, the key is removed from the TC. Register X is set to 0 if the key was 
not present in any translation cache, or to I if the key was present in the TC for 
instructions, or to 2 if the key was present in the TC for data, or to 3 if the key was 
present in both. This instruction is for the operating system only. (Changes to the 
TC are not immediate; so SYNC and/or SYNCD ought to be done when appropriate, as 
discussed in MMIX-PIPE.) 
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47 . We mentioned earlier that cheap versions of MMIX might calculate the physical 
addresses with software instead of hardware, using forced traps when the operating 
system needs to do page table calculations. Here is some code that could be used 
for such purposes; it defines the translation process precisely, given a nonnegative 
virtual address in register rYY. First we must unpack the fields of rV and compute 
the relevant base addresses for PTEs and PTPs: 



GET 


virt , rYY 




GET 


$7,rV ; 


/, $7=(virtual translation register) 


SRU 


$1, virt, 61 


4 $l=i (segment number of virtual address) 


SLU 


$1,$1,2 




NEG 


$i,52,$i ; 


/. $l=52-4i 


SRU 


$1,$7,$1 




SLU 


$2, $1,4 




SETL 


$0,#f000 




AND 


$i,$i,$o ; 


/. $l=b[i]«12 


AND 


$2,$2,$0 ; 


/. $2=b[i+l]«12 


SLU 


$3, $7, 24 




SRU 


$3, $3, 37 




SLU 


$3, $3, 13 ; 


$3=(r field of rV) 


QRH 


$3, #8000 ! 


/, make $3 a physical address 


2ADDU 


base, $1, $3 


base=address of first page table 


2ADDU 


limit, $2, $3 


limit=address after last page table 


SRU 


s,$7,40 




AND 


s,s,#ff ; 


/, s=(s field of rV) 


CMP 


$0,s,13 




BN 


$0,Fail ! 


/, s must be 13 or more 


CMP 


$0,s,49 




BNN 


$0,Fail ; 


/, s must be 48 or less 


SETH 


mask, #8000 




DRL 


mask,#lff8 ; 


mask=(sign bit and n field) 


QRH 


$7, #8000 ! 


/, set sign bit for PTP validation below 


ANDNH 


virt,#e000 


zero out the segment number 


SRU 


$0,virt,s ! 


/, $0=a4a3a2ala0 (page number of virt) 


ZSZ 


$i,$o,i ; 


L $l=[page number is zero] 


ADD 


limit, limit, $1 ] 


increase limit if page number is zero 


SETL 


$6,#3ff 





The next part of the routine hnds the “digits” of the page number (0403020100)1024, 
from right to left: 

CMP $5, base, limit; SRU $1,$0,10; PBZ $1,1F 

AND $0,#3ff; INCL base, #2000 

CMP $5, base, limit; SRU $2, $1,10; PBZ $2,2F 

AND $l,#3ff; INCL base, #2000 

CMP $5, base, limit; SRU $3, $2, 10; PBZ $3,3F 

AND $2,#3ff; INCL base, #2000 

CMP $5, base, limit; SRU $4, $3, 10; PBZ $4,4F 

AND $3,#3ff; INCL base, #2000 
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Then the process cascades back through PTPs. 

CMP $5 , base , limit 

BNN $5, Fail; 8ADDU $6, $4, base; LDO base, $6,0 
XOR $6, base, $7; AND $6, $6, mask; BNZ $6, Fail 
ANDNL base,#lfff 

4H BNN $5, Fail; 8ADDU $6, $3, base; LDO base, $6,0 
XOR $6, base, $7; AND $6, $6, mask; BNZ $6, Fail 
ANDNL base,#lfff 

3H BNN $5, Fail; 8ADDU $6, $2, base; LDO base, $6,0 
XOR $6, base, $7; AND $6, $6, mask; BNZ $6, Fail 
ANDNL base,#lfff 

2H BNN $5, Fail; 8ADDU $6, $1, base; LDO base, $6,0 
XOR $6, base, $7; AND $6, $6, mask; BNZ $6, Fail 

Finally we obtain the PTE and communicate it to the machine. If errors have been 
detected, we set the translation to zero; actually any translation with permission bits 
zero would have the same effect. 





ANDNL 


base,#lfff 


’/. remove low 13 bits of PTP 




IH 


BNN 


$5, Fail 








8ADDU 


$6, $0, base 








LDO 


base, $6,0 


7. base=PTE 






XOR 


$6 , base , $7 








ANDN 


$6, $6, #7 








SLU 


$6, $6, 51 








PBZ 


$6, Ready 


7. branch if n matches 




Fail 


SETL 


base,0 


7« errors lead to PTE of zero 




Ready PUT 


rZZ,base 








LDO 


$255 , IntMask 


7, load the desired setting of 


rK 




RESUME 


1 


7. now the machine will digest 


the translation 



All loads and stores in this program deal with negative virtual addresses. This effec- 
tively shuts off memory mapping and makes the page tables inaccessible to the user. 

The program assumes that the ropcode in rXX is 3 (which it is when a forced trap 
is triggered by the need for virtual translation). 

The translation from virtual pages to physical pages need not actually follow the 
rules for PTPs and PTEs; any other mapping could be substituted by operating 
systems with special needs. But people usually want compatibility between different 
implementations whenever possible. The only parts of rV that MMIX really needs are 
the s field, which defines page sizes, and the n field, which keeps TC entries of one 
process from being confused with the TC entries of another. 
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48. The complete instruction set. We have now described all of MMIX’s special 
registers — except one: The special failure location register rF is set to a physical 
memory address when a parity error or other memory fault occurs. (The instruction 
leading to this error will probably be long gone before such a fault is detected; for 
example, the machine might be trying to write old data from a cache in order to make 
room for new data. Thus there is generally no connection between the current virtual 
program location rW and the physical location of a memory error. But knowledge 
of the latter location can still be useful for hardware repair, or when an operating 
system is booting up.) 

49. One additional instruction proves to be useful. 

• SWYM X,Y,Z ‘sympathize with your machinery’. 

This command lubricates the disk drives, fans, magnetic tape drives, laser printers, 
scanners, and any other mechanical equipment hooked up to MMIX, if necessary. Fields 
X, Y, and Z are ignored. 

The SWYM command was originally included in MMIX’s repertoire because machines 
occasionally need grease to keep in shape, just as human beings occasionally need 
to swim or do some other kind of exercise in order to maintain good muscle tone. 
But in fact, SWYM has turned out to be a “no-op,” an instruction that does nothing 
at all; the hypothetical manufacturers of our hypothetical machine have pointed out 
that modern computer equipment is already well oiled and sealed for permanent use. 
Even so, a no-op instruction provides a good way for software to send signals to the 
hardware, for such things as scheduling the way instructions are issued on superscalar 
superpipelined buzzword-compliant machines. Software programs can also use no-ops 
to communicate with other programs like symbolic debuggers. 

When a forced trap computes the translation rZZ of a virtual address rYY, rop- 
code 3 of RESUME 1 will put (rYY, rZZ) into the TC for instructions if the opcode 
in rXX is SWYM; otherwise (rYY, rZZ) will be put into the TC for data. 

50. The running time of MMIX programs depends to a great extent on changes in 
technology. MMIX is a mythical machine, but its mythical hardware exists in cheap, 
slow versions as well as in costly high-performance models. Details of running time 
usually depend on things like the amount of main memory available to implement 
virtual memory, as well as the sizes of caches and other buffers. 

For practical purposes, the running time of an MMIX program can often be estimated 
satisfactorily by assigning a fixed cost to each operation, based on the approximate 
running time that would be obtained on a high-performance machine with lots of 
main memory; so that’s what we will do. Each operation will be assumed to take 
an integer number of v, where v (pronounced “oops”) is a unit that represents the 
clock cycle time in a pipelined implementation. The value of v will probably decrease 
from year to year, but I’ll keep calling it v. The running time will also depend on 
the number of memory references or mems that a program uses; this is the number 
of load and store instructions. For example, each EDO (load octa) instruction will be 
assumed to cost R + v, where r is the average cost of a memory reference. The total 
running time of a program might be reported as, say, 35/i -I- lOOOu, meaning 35 mems 
plus 1000 oops. The ratio r/v will probably increase with time, so mem-counting is 
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likely to become increasingly important. [See the discussion of mems in The Stanford 
GraphBase (New York: ACM Press, 1994).] 

Integer addition, subtraction, and comparison all take just Iv. The same is true 
for SET, GET, PUT, SYNC, and SWYM instructions, as well as bitwise logical operations, 
shifts, relative jumps, comparisons, conditional assignments, and correctly predicted 
branches-not-taken or probable-branches-taken. Mispredicted branches or probable 
branches cost 3u, and so do the POP and GO commands. Integer multiplication takes 
lOu; integer division weighs in at 60u. TRAP, TRIP, and RESUME cost 5v each. 

Most floating point operations have a nominal running time of 4u, although the 
comparison operators FCMP, FEQL, and FUN need only Iv. FDIV and FSQRT cost 40t> 
each. The actual running time of floating point computations will vary depending on 
the operands; for example, the machine might need one extra v for each subnormal 
input or output, and it might slow down greatly when trips are enabled. The FREM 
instruction might typically cost (3 + (5)u, where 5 is the amount by which the exponent 
of the first operand exceeds the exponent of the second (or zero, if this amount is 
negative) . A floating point operation might take only Iv if at least one of its operands 
is zero, infinity, or NaN. However, the fixed values stated at the beginning of this 
paragraph will be used for all seat-of-the-pants estimates of running time, since we 
want to keep the estimates as simple as possible without making them terribly out of 
line. 

All load and store operations will be assumed to cost p + v, except that CSWAP 
costs 2fi + 2v. (This applies to all OP codes that begin with *8, *9, *A, and *B, 
except *98-*9F and *B8-*BF. It’s best to keep the rules simple, because fi is just 
an approximate device for estimating average memory cost.) SAVE and UNSAVE are 
charged 20 p + v. 

Of course we must remember that these numbers are very rough. We have not 
included the cost of fetching instructions from memory. Furthermore, an integer 
multiplication or division might have an effective cost of only lu, if the result is not 
needed while other numbers are being calculated. Only a detailed simulation can be 
expected to be truly realistic. 
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51. If you think that MMIX has plenty of operation codes, you are right; we have 
now described them all. Here is a chart that shows their numeric values: 





*0 


*1 


#2 


*3 


#4 


*5 


*6 


*7 




*0x 


TRAP 


FCMP 


FUN 


FEQL 


FADD 


FIX 


FSUB 


FIXU 


*0x 


PLOT [I] 


FLOTUfl] 


SFEOTfl] 


SFEOTUfl] 


*lx 


FMUL 


FCMPE 


FUNE 


FEQLE 


FDIV 


FSQRT 


FREM 


PINT 


*lx 


MUL[I] 


MULU [I] 


DIV[I] 


DIVU[I] 


*2x 


ADD [I] 


ADDU [I] 


SUB [I] 


SUBUfl] 


*2x 


2ADDU[I] 


4ADDU[I] 


8ADDU[I] 


16ADDU[I] 


*3>x 


CMP [I] 


CMPUfI] 


NEC [I] 


NEGUfl] 


*3x 


SL[I] 


SLU[I] 


SR [I] 


SRU[I] 


*4x 


BN [B] 


BZ[B] 


BP [B] 


B0D[B] 


*4x 


BNN [B] 


BNZ[B] 


BNP [B] 


BEV [B] 


*5x 


PBN [B] 


PBZ[B] 


PBP [B] 


PBOD [B] 


*5x 


PBNN [B] 


PBNZ [B] 


PBNP [B] 


PBEV [B] 


*Qx 


CSN[I] 


CSZ[I] 


CSP[I] 


CS0D[I] 


*Qx 


CSNN[I] 


CSNZfl] 


CSNPfl] 


CSEV[I] 


*lx 


ZSN[I] 


ZSZ[I] 


ZSP[I] 


ZS0D[I] 


*7x 


ZSNN[I] 


ZSNZfl] 


ZSNPfl] 


ZSEV[I] 


*8x 


LDB[I] 


LDBUfI] 


EDW[I] 


EDWU[I] 


*8x 


LDT[I] 


LDTU[I] 


EDO [I] 


EDOUfl] 


*9x 


LDSF[I] 


LDHT [I] 


CSWAPfl] 


EDUNCfl] 


*9x 


LDVTSfl] 


FREED [I] 


PREGO [I] 


GO [I] 


*Ax 


STB [I] 


STBU[I] 


STW[I] 


STWU[I] 


*kx 


STT[I] 


STTU[I] 


ST0[I] 


ST0U[I] 


*Bx 


STSF[I] 


STHT [I] 


STCD [I] 


STUNCfl] 


*Bx 


SYNCDfl] 


PRESTfl] 


SYNCIDfl] 


PUSHGO [I] 


*Cx 


DR [I] 


0RN[I] 


NOR [I] 


X0R[I] 


*Cx 


AND [I] 


ANDN [I] 


NAND [I] 


NX0R[I] 


*Dx 


BDIF[I] 


WDIF[I] 


TDIF[I] 


0DIF[I] 


*Dx 


MUX [I] 


SADD [I] 


M0R[I] 


MX0R[I] 


*Ex 


SETH 


SETMH 


SETML 


SETL 


INCH 


INCMH 


INCME 


INCE 


#Ex 


ORH 


ORMH 


ORML 


ORE 


ANDNH 


ANDNMH 


ANDNME 


ANDNE 


*Fx 


JMP [B] 


PUSH! [B] 


GETA [B] 


PUT 


[I] 


#Fx 


POP 


RESUME 


SAVE 


UNSAVE 


SYNC 


SWYM 


GET 


TRIP 




*8 


*9 


*k 


#B 


#C 


#D 


*E 


#F 
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The notation ‘ [I] ’ indicates an operation with an “immediate” variant in which 
the Z held denotes a constant instead of a register number. Similarly, ‘ [B] ’ indicates 
an operation with a “backward” variant in which a relative address has a negative 
displacement. Simulators and other programs that need to present MMIX instructions 
in symbolic form will say that opcode *20 is ADD while opcode *21 is ADD I; they will 
say that *F2 is PUSHJ while *F3 is PUSHJB. But the MMIX assembler uses only the 
forms ADD and PUSHJ, not ADDI or PUSHJB. 

To read this chart, use the hexadecimal digits at the top, bottom, left, and right. 
For example, operation code A9 in hexadecimal notation appears in the lower part of 
the *Ax row and in the *l/*9 column; it is STTI, ‘store tetrabyte immediate’. 



MMIX-ARITH 

1. Introduction. The subroutines below are used to simulate 64-bit MMIX arith- 
metic on an old-fashioned 32-bit computer — like the one the author had when he 
wrote MMIXAL and the first MMIX simulators in 1998 and 1999. All operations are fab- 
ricated from 32-bit arithmetic, including a full implementation of the IEEE floating 
point standard, assuming only that the C compiler has a 32-bit unsigned integer type. 

Some day 64-bit machines will be commonplace and the awkward manipulations 
of the present program will look quite archaic. Interested readers who have such 
computers will be able to convert the code to a pure 64-bit form without difficulty, 
thereby obtaining much faster and simpler routines. Meanwhile, however, we can 
simulate the future and hope for continued progress. 

This program module has a simple structure, intended to make it suitable for 
loading with MMIX simulators and assemblers. 

^include <stdio.h> 

^include <string.h> 

T^include <ctype.h> 

( Stuff for C preprocessor 2 ) 

typedef enum { false, true } bool; 

( Tetrabyte and octabyte type definitions 3 ) 

( Other type definitions 36 ) 

( Global variables 4 ) 

( Subroutines 5 ) 

2. Subroutines of this program are declared first with a prototype, as in ANSI C, 
then with an old-style C function definition. Here are some preprocessor commands 
that make this work correctly with both new-style and old-style compilers. 

( Stuff for C preprocessor 2 ) = 

#ifdef __STDC__ 

^define ARGS(fof) list 
ffelse 

^define ARCS (fof) () 

^endif 

This code is used in section 1. 

3. The definition of type tetra should be changed, if necessary, so that it represents 
an unsigned 32-bit integer. 

( Tetrabyte and octabyte type definitions 3 } = 
typedef unsigned int tetra; 

/* for systems conforming to the LP-64 data model */ 
typedef struct { 
tetra h, l\ 

} octa; /* two tetrabytes make one octab 3 de */ 

This code is used in section 1. 

4. T^^define sign.bit ((unsigned) *80000000) 

( Global variables 4 ) = 



D.E. Knuth: MMIXware, LNCS 1750, pp. 62-109, 2014. 

DOI: 10. 1007/3-540-46611-8-3 © Author and Springer- Verlag Berlin Heidelberg 2014 
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octa zero.octa; /* zero.octa .h = zero.octa .1 — 0 */ 

octa neg-one = {-1,-1}; /* neg.one.h = neg.one.l = —1 */ 

octa inf-octa = {*7ff 00000, 0}; /* floating point +oo */ 

octa standard.NaN = {*7ff 80000, 0}; /* floating point NaN(. 5) */ 

See also sections 9, 30, 32, 69, and 75. 

This code is used in section 1. 

5. It’s easy to add and subtract octabytes, if we aren’t terribly worried about speed. 
( Subroutines 5 ) = 

octa oplus ARCS ((octa, octa)); 
octa oplus (y,z) /* compute y + 2 */ 

octa y, z\ 

{ octa x\ 

x.h = y.h + z.h- x.l = y.l + z.l; 
if {x.l < y.l) x.h++- 

return x\ 

} 

octa ominus ARGS((octa, octa)); 
octa ominus {y,z) /* compute y — 2 */ 

octa y, 2 ; 

{ octa X-, 

x.h = y.h — z.h\ x.l = y.l — z.l; 
if {x.l > y.l) x.h — ; 

return x; 

} 

See also sections 6, 7, 8, 12, 13, 24, 25, 26, 27, 28, 29, 31, 34, 37, 38, 39, 40, 41, 44, 46, 50, 54, 60, 
61, 62, 68, 82, 85, 86, 88, 89, 91, and 93. 

This code is used in section 1. 

6. In the following subroutine, delta is a signed quantity that is assumed to fit in a 
signed tetrabyte. 

( Subroutines 5 ) += 

octa incr ARCS ((octa, int)); 
octa incr {y , delta) /* compute 1 / + <5 */ 
octa y; 
int delta; 

{ octa x; 

x.h = y.h; x.l = y.l + delta; 
if {delta > 0) { 

if {x.l < y.l) x.h++; 

} else if {x.l > y.l) x.h — ; 

return x; 

} 
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7. Left and right shifts are only a bit more difficult. 

( Subroutines 5 ) += 

octa shiftjeft ARGS((octa, int)); 

octa shift Jeft {y, s) /* shift left by s bits, where 0 < s < 64 */ 

octa y\ 
int s; 

{ 

while (s > 32) y.h = i/.Z, y.l = 0, s — = 32; 

if (s) { register tetra yhl = y.h ^ s, ylh = y.l ':$> (32 — s); 

y.h = yhl + ylh-, y.l <C= s; 

} 

return y, 

} 

octa shift.right ARGS((octa, int, int)); 

octa shiftjright(y, s,u) /* shift right, arithmetically if u = 0 ■*/ 
octa y; 
int s, u; 

{ 

while (s > 32) y.l = y.h, y.h = (u ? 0 : —{y.h 31)), s — = 32; 

if (s) { register tetra yhl = y.h (32 — s), ylh = y.l S> s; 

y.h = (w ? 0 : {—{y.h 31)) <C (32 — s)) + {y.h s); y.l = yhl + ylh-, 

} 

return y; 

} 
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8. Multiplication. We need to multiply two unsigned 64-bit integers, obtaining 
an unsigned 128-bit product. It is easy to do this on a 32-bit machine by using 
Algorithm 4.3.1M of Seminumerical Algorithms, with b = 2^®. 

The following subroutine returns the lower half of the product, and puts the upper 
half into a global octabyte called aux. 

{ Subroutines 5 ) -|-= 

octa omult ARGS((octa, octa)); 
octa omult {y, z) 
octa y, z\ 

{ 

register int i, j, fc; 
tetra w[4], u[4], w[8]; 
register tetra t; 
octa occ; 

( Unpack the multiplier and multiplicand to u and u 10 ); 
for (j = 0- j < 4; j++) «;[j] = 0; 
for (j = 0; j < 4; j+-l-) 
if w[j -I- 4] = 0; 

else { 

for {i = k — 0\ i < 4; i+-l-) { 
t = u\i] * v[j] + w[i + j] + k\ 

W'b + i] = t & *ffff , fc = t 16; 

} 

w\j -I- 4] = fc; 

} 

(Pack w into the outputs aux and acc ll); 
return acc; 

} 

9. (Global variables 4) -|-= 

octa aux-, /* secondary output of subroutines with multiple outputs */ 
bool overflow, /* set by certain subroutines for signed arithmetic */ 

10 . (Unpack the multiplier and multiplicand to u and a lo) = 

u[3] = y.h ^ 16, u\2] = y.h & *ffff , u[l] = y.l ^ 16, u[0] = y.l & *ffff ; 
u[3] = z.h S> 16, v[2] = z.h & *ffff , u[l] = z.l ^ 16, u[0] = z.l & *ffff ; 

This code is used in section 8. 

11 . (Pack w into the outputs aux and acc ii ) = 

aux.h = {w[7] 16) + ^t?[6], aux.l = (w[5] 16) + 

acc.h = ('?n[3] 16) -\-w[2], acc. I = (w[l] 16) + tf;[0]; 

This code is used in section 8. 



ARCS = macro ( ), §2. 
bool: enum, §1. 



h: tetra, §3. 
1: tetra, §3. 



octa = struct, §3. 
tetra = unsigned int, §3. 
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12. Signed multiplication has the same lower half product as unsigned multiplica- 
tion. The signed upper half product is obtained with at most two further subtractions, 
after which the result has overflowed if and only if the upper half is unequal to 64 
copies of the sign bit in the lower half. 

( Subroutines 5 ) -|-= 

octa signed.omult ARGS((octa, octa)); 
octa signed.omult{y, z) 
octa y, z\ 

{ 

octa acc; 

acc = omult{y, z)', 

if (y.hEzsignJ)it) aux = ominus{aux , z); 
if {z .h &L sign.bit) aux = ominus{aux ,y)\ 

overflow = {aux.h ^ aux. I V {aux.h © {aux.h 1) © {acc.h & sign.bit))); 

return acc; 

} 



67 



MMIX-ARITH: DIVISION 



13. Division. Long division of an unsigned 128-bit integer by an unsigned 64-bit 
integer is, of course, one of the most challenging routines needed for MMIX arithmetic. 
The following program, based on Algorithm 4. 3. ID of Seminumerical Algorithms, 
computes octabytes q and r such that {2^^x + y) = qz + r and 0 < r < z, given 
octabytes x, y, and z, assuming that x < z. (If a; > z, it simply sets q = x and 
r = y.) The quotient q is returned by the subroutine; the remainder r is stored 
in aux. 

{ Subroutines 5 ) -|-= 

octa odiv ARGS((octa, octa, octa)); 
octa odiv{x,y, z) 
octa X, y, z; 

{ 

register int i, j, k, n, d; 

tetra w[8], u[4], g[4], mask, qhat, rhat, vh, vmh; 

register tetra t\ 
octa occ; 

(Check that x < z\ otherwise give trivial answer 14); 

( Unpack the dividend and divisor to u and v 15 ); 

( Determine the number of significant places n in the divisor v 16 ); 

(Normalize the divisor 17); 

for (i = 3; j 0; j — ) (Determine the quotient digit q[j] 20 ); 

(Unnormalize the remainder is); 

( Pack q and u to occ and aux 19 ); 
return occ; 

} 

14. (Check that x < z\ otherwise give trivial answer 14) = 
if {x.h > z.h V {x.h = z.h A x.l > z.l)) { 

aux = y, return x; 

} 

This code is used in section 13. 

15. (Unpack the dividend and divisor to u and n 15 ) = 

u[7] = x.h 16, n[6] = x.h & ^ffff , n[5] = x.l ^ 16, Zi[4] = x.l & ^ffff ; 

w[3] = y.h 16, u[2] — y.h & w[l] = y.l 16, w[0] = y.l & 

i;[3] = z.h ^ 16, v[2] = z.h & f f f , r?[l] = z.l 16, tj[0] — z.l & ff f ; 

This code is used in section 13. 

16. (Determine the number of significant places n in the divisor t; le) = 
for (n = 4; v[n — 1] = 0; n — ) ; 

This code is used in section 13. 



ARCS = macro (), §2. octa = struct, §3. 

aux: octa, §9. ominus: octa (), §5. 

h: tetra, §3. omult: octa (), §8. 

1: tetra, §3. 



overflow: bool, §9. 
sign.bit = macro, §4. 

tetra = unsigned int, §3. 
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17. We shift u and v left by d places, where d is chosen to make 2^^ < Vn-i < 2^®. 

( Normalize the divisor 17 ) = 
vh = v[n — 1]; 

for (d = 0; vh < *8000; d++, vh <C= 1) ; 
for {j = k = 0- j <n + 4- j++) { 
t = {u\j] < d) + fc; 
u[j] = t k, *ffff , fc = t S> 16; 

} 

for {j = k = 0; j < n; j++) { 
t = {v\j] < d) + fc; 
v\j] = t k *ffff , fc = t S> 16; 

} 

vh = v\n — 1]; 

vmh = (n > 1 ? v\n — 2] : 0); 

This code is used in section 13. 

18. ( Unnormalize the remainder 18} = 
mask = (1 ^ d) — 1; 

for (j = 3; j> n; j — ) u[j] = 0; 
for {k = 0; j > 0; j — ) { 
t = (fc < 16) + u\j]-, 

= t ^ d,k = t k mask ; 

} 

This code is used in section 13. 

19. (Pack q and u to acc and aux 19 ) = 

acc.h = ((j[3] <C 16) + q[2], acc. I = (g[l] < 16) + i7[0]; 
aux.h = (m[ 3] <C 16) + u[2], aux. I = (u[l] <C 16) + w[0]; 

This code is used in section 13. 

20. (Determine the quotient digit q[j] 20 } = 

{ 

( Find the trial quotient, <? 21 ); 

(Subtract IP qv from u 22 ); 

( If the result was negative, decrease g by 1 23 ) ; 
q\j\ = qhat- 

} 

This code is used in section 13. 

21. (Find the trial quotient, q 21 } = 
t = {u\j + n] <c 16) + u\j + n — 1]; 
qhat = t/vh, rhat = t — vh * qhat-, 

if (n > 1) 

while {qhat = *10000 V qhat * vmh > {rhat <C 16) + u[j + n — 2]) { 
qhat — , rhat += vh ; 
if {rhat > *10000) break; 

} 



This code is used in section 20. 
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22. After this step, u[j + n] will either equal k oi k — 1. The true value of u would 
be obtained by subtracting k from u[j + n]; but we don’t have to fuss over u\j + n], 
because it won’t be examined later. 

(Subtract Vqv from u 22 ) = 
for (i = fc = 0; i < n\ i++) { 

t = u[i + j] + *ffff0000 — k — qhat * u[i]; 

+ j] = t & f f f , k = *ffff — {t ^ 16); 

} 

This code is used in section 20. 

23. The correction here occurs only rarely, but it can be necessary — for example, 
when dividing the number *7fff 800100000000 by *800080020005. 

( If the result was negative, decrease qhj 1 23 } = 
if (u[j + n] / fc) { 
qhat — ; 

for {i = k = Q\ i < n; i++) { 
t = u[i + j] + v[i] + k; 
u[i + j] = t &i *ff f f , k = t ^ 16; 

} 

} 

This code is used in section 20. 



acc: octa, §13. 
aux: octa, §9. 
d: register int, §13. 
h: tetra, §3. 

V. register int, §13. 
j\ register int, §13. 



k\ register int, §13. 
1: tetra, §3. 
mask: tetra, §13. 
n: register int, §13. 
q: tetra [], §13. 
qhat: tetra, §13. 



rhat: tetra, §13. 
t: register tetra, §13. 
u: tetra [], §13. 
v: tetra [], §13. 
vh: tetra, §13. 
vmh: tetra, §13. 
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24. Signed division can be reduced to unsigned division in a tedious but straight- 
forward manner. We assume that the divisor isn’t zero. 

( Subroutines 5 ) -|-= 

octa signed.odiv ARGS((octa, octa)); 
octa signed.odiv (y, z) 
octa y, z\ 

{ 

octa yy, zz, q\ 

register int sy, sz\ 

if {y .h sign.bit) sy =2,yy = ominus{zero-octa,y)\ 

else sy =Q,yy = y; 

if [z.h k, signjtit) sz = l,zz = ominus {zero. octa, z)\ 

else sz =G,zz = z\ 

q= odiv {zero. octa , yy , zz)\ 

overflow = false ; 

switch {sy + sz) { 

case 2 + 1: aux = ominus {zero. octa, aux); 
if {q.h = sign.bit) overflow — true-, 

case 0 + 0: return q; 

case 2 + 0: if {aux.hV aux.l) aux = ominus {zz , aux)-, 
goto negate.q; 

case 0+1: if {aux.hV aux.l) aux = ominus {aux , zz)-, 
negate.q-. if {aux .h V aux .1) return ominus {neg. one, q)-, 
else return ominus {zero. octa, q)-, 

} 

} 
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25. Bit fiddling. The bitwise operators of MMIX are fairly easy to implement 
directly, but three of them occur often enough to deserve packaging as subroutines. 

( Subroutines 5 ) += 

octa oand ARGS((octa, octa)); 
octa oand(y,z) /* compute y A 2 */ 
octa y, z\ 

{ octa x\ 

x.h = y.h & z.h- x.l = y.l & z.l\ 

return x\ 

} 

octa oandn ARGS((octa, octa)); 
octa oandn{y,z) /* compute y A 2 */ 
octa y, 2; 

{ octa x; 

x.h = y.h & ~2./i; x.l = y.l & ^z.l\ 

return x\ 

} 

octa oxor ARGS((octa, octa)); 
octa oxor{y,z) /* compute j/ © 2 */ 
octa y, z\ 

{ octa x\ 

x.h = y.h © z.h\ x.l = y.l © 2.Z; 



return x', 



} 



ARCS = macro ( ), §2. 
aux: octa, §9. 



neg^one: octa, §4. 
octa = struct, §3. 
odiv. octa ( ), §13. 



overflow: bool, §9. 
sign.bit = macro, §4. 



false =0, §1. 



true = 1, §1. 



h: tetra, §3. 
1: tetra, §3. 



ominus: octa (), §5. 



zero.octa: octa, §4. 
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26 . Here’s a fun way to count the number of bits in a tetrabyte. [This classical 
trick is called the “Gillies-Miller method for sideways addition” in The Preparation 
of Programs for an Electronic Digital Computer by Wilkes, Wheeler, and Gill, second 
edition (Reading, Mass.: Addison-Wesley, 1957), 191-193. Some of the tricks used 
here were suggested by Balbir Singh, Peter Rossmanith, and Stefan Schwoon.j 

( Subroutines 5 ) += 

int count.bits ARGS((tetra)); 
int count J)its{x) 
tetra x\ 

{ 

register int xx = x\ 

XX = XX — {[xx 1) & *55555555); 

XX = {xx & *33333333) + {{xx > 2) & *33333333); 

XX = {xx + {xx S> 4)) & *0f0f0f0f ; 

XX = xx + {xx ^ 8); 

return {xx + {xx 16)) & *ff ; 

} 

27 . To compute the nonnegative byte differences of two given tetrabytes, we can 
carry out the following 20-step branchless computation: 

( Subroutines 5 ) -|-= 

tetra byte^diff ARGS((tetra, tetra)); 
tetra byte.diff {y , z) 
tetra y, z\ 

{ 

register tetra d= {y *00ff00ff ) -|- *01000100 — {z *00ff00ff ); 
register tetra m = dSz *01000100; 
register tetra x = d {m — {m 8)) ■, 

d = ((y > 8) & *00ff00ff ) -t *01000100 - {{z > 8) & *00ff00ff ); 

m = *01000100; 

return x -I- {{d {m — {m^ 8))) <C 8); 

} 

28 . To compute the nonnegative wyde differences of two tetrabytes, another trick 
leads to a 15-step branchless computation. (Research problem: Gan count-bits, 
bytc-diff , or wyde-diff be done with fewer operations?) 

( Subroutines 5 ) -|-= 

tetra wydc-diff ARGS((tetra, tetra)); 
tetra wydc-diff {y, z) 
tetra y, z\ 

{ 

register tetra a = {{y 16) — {z ^ 16)) & *10000; 

register tetra b — {{y & *ffff ) — {z &c *ffff )) & *10000; 

return y — {z (B {{y (B z) &c {b — a — {b 16)))); 

} 
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29. The last bitwise subroutine we need is the most interesting: It implements 
MMIX’s MOR and MXOR operations. 

( Subroutines 5 ) += 

octa booLmult ARGS((octa, octa, bool)); 

octa booLmult {y, z, xor) 

octa y, z\ /* the operands */ 

bool xov, /* do we do xor instead of or? */ 

{ 

octa o, x\ 

register tetra a, b, c; 
register int fc; 

for {k = 0,0 = y,x = zero-octa-, o.hV o.l; k++,o = shift.right{o, 8,1)) 
if {o.l & *ff ) { 

a = {{z.h ■> k)&L *01010101) * *ff ; 
h = {{z.l ■> k)&L *01010101) * *ff ; 
c = {o.l & *ff) * *01010101; 
if {xor) x.h ®= a&L c,x.l ®= b &L c\ 
else x.h \= a&LC,x.l |= fe & c; 

} 

return x\ 

} 



ARCS = macro ( ), §2. 
bool: enum, §1. 
h\ tetra, §3. 



1: tetra, §3. 
octa = struct, §3. 

shift.right: octa (), §7. 



tetra = unsigned int, §3. 

zero.octa: octa, §4. 
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30. Floating point packing and unpacking. Standard IEEE floating binary 
numbers pack a sign, exponent, and fraction into a tetrabyte or octabyte. In this 
section we consider basic subroutines that convert between IEEE format and the 
separate unpacked components. 

#deflne RGUND_0FF 1 
#deflne RQUND_UP 2 
#deflne RQUND_DOWN 3 
#deflne RDUND_NEAR 4 
( Global variables 4 ) += 

int cur. round', /* the current rounding mode */ 



31. The fpack routine takes an octabyte /, a raw exponent e, and a sign s, and 
packs them into the floating binary number that corresponds to ±2®“^'’^®/, using a 
given rounding mode. The value of / should satisfy 2®^ < / < 2®®. 

Thus, for example, the floating binary number +1.0 = *3ff 0000000000000 is 
obtained when / = 2®^, e = *3fe, and s = ’ + ’ . The raw exponent e is usually one 
less than the final exponent value; the leading bit of / is essentially added to the 
exponent. (This trick works nicely for subnormal numbers, when e < 0, or in cases 
where the value of / is rounded upwards to 2®®.) 

Exceptional events are noted by oring appropriate bits into the global variable 
exceptions . Special considerations apply to underflow, which is not fully specified by 
Section 7.4 of the IEEE standard: Implementations of the standard are free to choose 
between two definitions of “tininess” and two definitions of “accuracy loss.” MMIX 
determines tininess after rounding, hence a result with e < 0 is not necessarily tiny; 
MMIX treats accuracy loss as equivalent to inexactness. Thus, a result underflows if 
and only if it is tiny and either (i) it is inexact or (ii) the underflow trap is enabled. 
The fpack routine sets U_BIT in exceptions if and only if the result is tiny, X_BIT if 
and only if the result is inexact. 



#deflne X_BIT (1 < 8) 

#deflne Z_BIT (K 9) 

#deflne U_BIT (1 < 10) 

#deflne 0_BIT (1 < 11) 

#deflne I_BIT (1 < 12) 

#deflne W_BIT (K 13) 
#deflne V_BIT (1 < 14) 

#deflne D_BIT (1 < 15) 

#deflne E_BIT (1 < 18) 



/* floating inexact */ 

/* floating division by zero */ 

/* floating underflow */ 

/* floating overflow */ 

/* floating invalid operation */ 

/* float-to-flx overflow */ 

/* integer overflow */ 

/* integer divide check */ 

/* external (dynamic) trap bit */ 



{ Subroutines 5 ) += 

octa fpack ARGS((octa, int, char, int)); 
octa fpack (f,e,s,r) 

octa /; /* the normalized fraction part */ 

int e; /* the raw exponent */ 

char s; /* the sign */ 

int r; /* the rounding mode */ 

{ 

octa O' 

if (e > *7fd) e = *7ff , o = zero.octa; 
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else { 

if (e < 0) { 

if (e < —54) o.h = 0,o.l — 1; 
else { octa oo; 

o = shifCright (f , — e, 1); 

00 — shiftJeft(o, —e)', 

if {oo.l 7^ f.l V oo.h 7^ f .h) o.l |= 1; /* sticky bit */ 

} 

e = 0; 

} else o = /; 

} 

( Round and return the result 33 ) ; 

} 

32. (Global variables 4) += 

int exceptions-, /* bits possibly destined for rA */ 

33. Everything falls together so nicely here, it’s almost too good to be true! 

( Round and return the result 33 ) = 

if (oJ & 3) exceptions |=X_BIT; 
switch (r) { 

case R0UND_D0WN: if (s = o=mcr(o,3); break; 

case R0UND_UP: if (s 7^ 0= incr{o,3)-, 

case R0UND_0FF: break; 

case R0UND_NEAR: o = incr{o,o.l & 4 ? 2 : 1); break; 

} 

o = shift.right {o, 2, 1); 
o.h -\-= e ^ 20; 

if (o.h > ^TffOQOQO) exceptions |= 0_BIT + X_BIT; /* overflow */ 
else if (o./i < ^lOOOOO) exceptions |=U_BIT; /* tininess */ 
if(s=*-^) o.h \ = sign.bit; 

return o; 

This code is used in section 31. 



shiftjright-. octa (), §7. 
sign.bit = macro, §4. 
zero.octa: octa, §4. 



ARCS = macro ( ), §2. 
h: tetra, §3. 
mcr : octa ( ), §6. 



1: tetra, §3. 
octa = struct, §3. 

shiftJeft: octa (), §7. 
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34. Similarly, sfpack packs a short float, from inputs having the same conventions 
as fpack . 

{ Subroutines 5 ) += 

tetra sfpack ARCS ( (octa, int, char, int)); 
tetra sfpack{f ,e, s,r) 

octa /; /* the fraction part */ 

int e; /* the raw exponent */ 
char s; /* the sign */ 
int r; /* the rounding mode */ 

{ 

register tetra o; 

if (e > *47d) e = *47f , o = 0; 
else { 

o = shiftjeft(f, 3)./i; 
if (/.Z & ) o 1= 1; 

if (e < *380) { 

if (e < *380 - 25) o = 1; 
else { register tetra oO , oo; 

oO = o; 

o = o > (*380 - e); 

00 = o <t. (*380 — e); 

if (oo 7 ^: od) o 1= 1; /* sticky bit */ 

} 

e = *380; 

} 

} 

( Round and return the short result 35 ) ; 

} 

35. ( Round and return the short result 35 ) = 
if (o&3) exceptions 1=X_BIT; 

switch (r) { 

case R0UND_D0WN: if (s = ’-’) o += 3; break; 
case R0UND_UP: if (s / ’-’) o += 3; 
case R0UND_0FF : break; 

case R0UND_NEAR: o += (o & 4 ? 2 : 1); break; 

} 

o = o » 2; 

o += (e- *380) < 23; 

if (o > *7f800000) exceptions |= 0_BIT + X_BIT; /* overflow */ 
else if (o < *100000) exceptions |=U_BIT; /* tininess */ 
if (s=’-’) o\= signj)it-, 

return o; 

This code is used in section 34. 

36. The funpack routine is, roughly speaking, the opposite of fpack. It takes a 
given floating point number x and separates out its fraction part /, exponent e, and 
sign s. It clears exceptions to zero. It returns the type of value found: zro, num, inf , 
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or nan. When it returns num, it will have set /, e, and s to the values from which 
fpack would produce the original number x without exceptions. 

95 ^:deflne zero-exponent (—1000) /* zero is assumed to have this exponent */ 

{ Other type definitions 36 ) = 
typedef enum { 
zro , num , inf , nan 
} ftype; 

See also section 59. 

This code is used in section 1. 

37 . ( Snbroutines 5 } += 

ftype funpack ARGS((octa, octa *, int *,char *)); 
ftype funpack {x, f, e, s) 

octa x\ /* the given floating point value */ 

octa */; /* address where the fraction part should be stored */ 

int *e; /* address where the exponent part should be stored */ 

char *s; /* address where the sign should be stored */ 

{ 

register int ee; 

exceptions = 0; 

*s = {x.h &i signjiit ? : ’ + ’ 

*/ = shift Jeft{x,2)\ 
f-h &= 

ee = {x.h ^ 20) & *7ff ; 
if (ee) { 

*e = ee — 1; 

/W 1= *400000; 

return (ee < *7ff ? num : f->h = *400000 A ~<f^l ? inf : nan)- 

} 

if {-^x.l A -'f-h) { 

*e = zero -exponent-, return zro-, 

} 

do { ee — ; *f = shift-left{*f,l)-, } while (-i(/-/i & *400000)); 

*e = ee; return num-, 

} 



ARCS = macro ( ), §2. 
exceptions: int, §32. 
fpack: octa (), §31. 
h: tetra, §3. 

1: tetra, §3. 

D_BIT = macro, §31. 



octa = struct, §3. 

RQUND_DQWN = 3, §30, 
RDUND_NEAR = 4, §30, 
R0UND_0FF = 1, §30. 
R0UND_UP = 2, §30. 



shiftJeft: octa (), §7. 
sign.bit = macro, §4. 
tetra = unsigned int, §3. 
U_BIT = macro, §31. 

X_BIT = macro, §31. 
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38. ( Subroutines 5 ) += 

ftype sfunpack ARGS((tetra, octa *, int *,char *)); 
ftype sfunpack {x, f,e,s) 

tetra x\ /* the given floating point value */ 
octa */; /* address where the fraction part should be stored */ 

int *€', /* address where the exponent part should be stored */ 

char *s; /* address where the sign should be stored */ 

{ 

register int ee; 

exceptions = 0; 

*s = (x & signj)it ? ’ - ’ : ’ + ’ ) ; 

f-h = (x S> 1) & , f->l = X <C 31; 

ee = (x ^ 23) & *ff ; 

if (ee) { 

*e = ee + *380 — 1; 
f-h 1= *400000; 

return (ee < *ff ? num : (x & *7fffffff ) = *7f800000 ? inf : nan)-, 

} 

if (-i(x & *7f f ff f f f )) { 

*e = zero. exponent-, return zro-, 

} 

do { ee — ; *f = shiftJeft{*f,l)-, } while (-i(/-^h & *400000)); 

*e = ee + *380; return num-, 

} 

39. Since MMIX downplays 32-bit operations, it uses sfpack and sfunpack only when 
loading and storing short floats, or when converting from fixed point to floating point. 

( Subroutines 5 ) -|-= 

octa load.sf ARCS ((tetra)); 
octa load.sf(z) 

tetra z; /* 32 bits to be loaded into a 64-bit register */ 

{ 

octa /, x; int e; char s; ftype t; 
t = sfunpack {z, &/, &e, &s); 
switch (t) { 

case zro: x = zero.octa-, break; 

case num: return /pacA:(/, e, s, RQUND_0FF); 

case inf: x = inf. octa; break; 

case nan: x = shift.right{f, 2,1); x./i |= *7ff00000; break; 

} 

if (s = ’-’) x.h\— sign.bit; 

return x; 

} 
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40. ( Subroutines 5 ) += 

tetra store.sf ARGS((octa)); 
tetra store.sf(x) 

octa X', /* 64 bits to be loaded into a 32-bit word */ 



octa /; tetra z; int e; char s; ftype t; 

t = funpack{x, , &e, &s); 

switch (t) { 

case zro: 2 = 0; break; 

case num\ return sfpack{f ,e, s, curjround)-, 
case inf-, z = *7f800000; break; 
case nan: if (-i(/./i & *200000)) { 

f.h 1= *200000; exceptions |= I_BIT; /* NaN was signaling */ 

} 

2 = *7f800000 I {f.h < 1) I {f.l > 31); break; 

} 

if (s = ’-’) 2 1= sign.bit; 

return z; 



{ 



} 



ARCS = macro ( ), §2. 
curjround: int, §30. 
exceptions: int, §32. 
fpack: octa (), §31. 
ftype = enum, §36. 



nan = 3, §36. 
num = 1, §36. 
octa = struct, §3. 
R0UND_0FF = 1, §30. 
sfpack: tetra {), §34. 



1: tetra, §3. 



inf = 2, §36. 



inf. octa: octa, §4. 



shiftJeft: octa (), §7. 
shift.right: octa (), §7. 
sign.bit = macro, §4. 
tetra = unsigned int, §3. 
zero.exponent = macro, §36. 
zero.octa: octa, §4. 



funpack: ftype (), §37. 
h: tetra, §3. 

I_BIT = macro, §31. 



zro = 0, §36. 
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41. Floating multiplication and division. The hardest fixed point operations 
were multiplication and division; but these two operations are the easiest to implement 
in fioating point arithmetic, once their fixed point counterparts are available. 

( Subroutines 5 ) += 

octa fmult ARGS((octa, octa)); 
octa fmult {y, z) 
octa y, z; 

{ 

ftype yt, zt; 

int ye, ze; 

char ys, zs', 

octa X, xf, yf, zf; 

register int xe; 

register char xs; 

yt ^ funpack{y,&cyf ,&iye,&iys); 

zt = funpack{z,&Lzf ,&Lze,&ezs)-, 

xs = ys -\- zs — ’ + ’ \ /* will be ’ - ’ when the result is negative * / 

switch [A* yt -\- zt) { 

(The usual NaN cases 42); 

case 4 * zro + zro: case 4 * zro + num: case 4 * num + zro: x = zero. octa-, break; 
case 4 * num + inf: case 4 * inf + num: case 4 * inf + inf: x = inf. octa- break; 
case 4 * zro + inf : case 4 * inf + zro : x = standard.NaN ; 
exceptions \— I_BIT; break; 

case 4* num + num: (Multiply nonzero nnmbers and return 43); 

} 

if (xs = x.h 1= sign.bit; 

return x; 

} 

42. (The usual NaN cases 42 ) = 

case 4 * nan + nan : if {^{y.h &i* 80000)) exceptions |= I_BIT; /* 1 / is signaling */ 
case 4 * zro + nan : case 4 * num + nan : case 4 * inf + nan : 
if {^{z.hSz *80000)) exeeptions |= 1.81T, z.h |= *80000; 
return 2 ; 

case 4 * nan + zro: case 4 * nan + num: case 4 * non + inf: 
if (-.(y./i& *80000)) exceptions 1.81T, y.h \= *80000; 
return y. 

This code is used in sections 41, 44, 46, and 93. 

43. ( Multiply nonzero numbers and return 43 ) = 

xe = ye -\- ze — *3fd; /* the raw exponent */ 

X = omult{yf , shift.left {zf , 9)); 

if {aux.h > *400000) xf = aux-, 
else xf = shift.left ( aux ,l),xe — ; 

if {x.h V x.l) xf .1 1= 1; /* adjust the sticky bit */ 

return fpack {xf ,xe,xs, cur. round ) ; 

This code is used in section 41. 
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44. ( Subroutines 5 ) += 

octa fdivide ARGS ((octa, octa)); 
octa fdivide(y, z) 
octa y, z; 

{ 

ftype yt, zt; 
int ye, ze; 
char ys, zs; 
octa X, xf, yf, zf\ 
register int xe; 
register char xs\ 

yt = funpack{y,&iyf ,Szye,^ys)-, 
zt = funpack{z,&izf ,&ize,&izs); 

xs = ys + zs — ’ + ’ ■ /* will be when the result is negative */ 

switch {4: * yt + zt) { 

(The usual NaN cases 42); 

case 4 * zro + inf-, case 4 * zro + num-. case 4 * num + inf-, x = zero-octa-, break; 
case 4 * num + zro-. exceptions \— Z_BIT; 

case 4 * inf + num-. case 4 * inf + zro-. x = inf. octa-, break; 
case 4 * zro + zro : case 4 * inf + inf : x = standard.NaN ; 
exceptions |= I_BIT; break; 

case 4 * num + num-. { Divide nonzero numbers and return 45 ); 

} 

if {xs = x.h 1= sign.bit-, 

return x; 

} 

45. (Divide nonzero numbers and return 45) = 

xe = ye — ze A *3fd; /* the raw exponent */ 

xf = odiv{yf , zero.octa, shiftJeft{zf ,9)); 

if [xf.h > *800000) { 
aux .1 \= xf .1 &c 1; 
xf = shiftjright{xf , 1, 1); 
xe++; 

} 

if {aux .h\/ aux .1) xf.l\=l\ /* adjust the sticky bit */ 
return fpack{xf , xe, xs, cur.round); 

This code is used in section 44. 



ARGS = macro ( ), §2. 
aux: octa, §9. 
curjround: int, §30. 
exceptions: int, §32. 
fpack: octa (), §31. 
ftype = enum, §36. 
/unpack: ftype {), §37. 
h: tetra, §3. 

I_BIT = macro, §31. 
inf =2, §36. 



inf. octa: octa, §4. 

1: tetra, §3. 
nan = 3, §36. 
num = 1, §36. 
octa = struct, §3. 
odiv: octa ( ), §13. 
omult: octa {), §8. 
shiftJeft: octa (), §7. 
shift.right: octa (), §7. 



sign.bit = macro, §4. 
standard.NaN : octa, §4. 
y: octa, §46. 
y: octa, §93. 
octa, §46. 
octa, §93. 

Z_BIT = macro, §31. 
zero.octa: octa, §4. 
zro = 0, §36. 
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46. Floating addition and subtraction. Now for the bread-and-butter oper- 
ation, the sum of two floating point numbers. It is not terribly difficult, but many 
cases need to be handled carefully. 

( Subroutines 5 ) -|-= 

octa fplus ARGS((octa, octa)); 
octa fplus {y, z) 
octa y, z; 

{ 

ftype yt, zt\ 

int ye, ze; 

char ys, zs\ 

octa X, xf, yf, zf; 

register int xe, d; 

register char xs; 

yt ^ funpack{y,&cyf ,&eye,&iys); 

zt = funpack {z,&izf , &ize , &izs ) ; 

switch {4 * yt + zt) { 

(The usual NaN cases 42); 

case 4 * zro -I- nrim : return /pacA:(z/, ze, zs, R0UND_0FF); break; 

/* may underflow */ 

case 4 * nnm -t zro : return fpack{yf , ye, ys ,R0\SM)_0FF); break; 

/* may underflow */ 
case 4 * inf + inf: if {ys ^ zs) { 

exceptions |= I_BIT; x = standard.NaN ; xs = zs; break; 

} 

case 4 * num -|- inf: case 4 * zro + inf: x = inf. octa; xs = zs; break; 
case 4 * inf + num: case 4 * inf + zro: x = inf. octa; xs = ys; break; 
case 4 * num -1- num: if {y.h {z.h © *80000000) V y.l ^ z.l) 

{ Add nonzero numbers and return 47 ) ; 
case 4 * zro + zro : x = zero.octa ; 

xs = {ys = zs ? ys : cur.round = R0UND_D0WN ? : ’ + ’); break; 

} 

if {xs = ’-’) x.h 1= sign.bit; 

return x; 

} 

47. ( Add nonzero numbers and return 47 ) = 

{ octa o, 00 ; 

if {ye < ze V {ye = ze A {yf .h < zf.hV {yf .h = zf .h A yf .1 < zf .1)))) 

{ Exchange y with z 48 ) ; 
d — ye — ze; 
xs = ys ,xe = ye; 

if (d) (Adjust for difference in exponents 49); 
if {ys = zs) { 

xf = oplus{yf ,zf); 

if {xf .h > *800000) xe++,d = xf .1 1, xf = shift.right{xf ,1,1), xf .1 \= d; 

} else { 

xf = ominus{yf , zf); 

if {xf .h > *800000) xe++, d = xf .1 1, xf = shift.right{xf ,1,1), xf .1 \— d; 
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else while {xf .h < *400000) xe — ,xf = shiftJeft{xf , 1); 

} 

return fpack {xf ,xe,xs, cur.round); 

} 

This code is used in section 46. 

48. ( Exchange y with z 48 ) = 

{ 

o= yf.yf = Zf,zf = o; 
d = ye, ye = ze, ze = d-, 
d = ys,ys = zs,zs = d- 

} 

This code is used in sections 47 and 51. 

49. Proper rounding requires two bits to the right of the fraction delivered to fpack. 
The first is the true next bit of the result; the other is a “sticky” bit, which is nonzero 
if any further bits of the true result are nonzero. Sticky rounding to an integer takes 
X into the number \x/2\ + |"a:/2] . 

Some subtleties need to be observed here, in order to prevent the sticky bit from 
being shifted left. If we did not shift yf left 1 before shifting zf to the right, 
an incorrect answer would be obtained in certain cases — for example, if yf = 2^^^, 
zf = 2^4 + 253 -l,d = 52. 

( Adjust for difference in exponents 49 ) = 

{ 

if {d < 2) zf = shiftjright{zf ,d,l)\ /* exact result */ 

else if (d > 53) zf .h = 0, zf .1 — 1; /* tricky but OK */ 

else { 

if {ys zs) d — ,xe — ,yf = shift.left{yf ,1)-, 
o = zf\ 

zf = shift jright{o, d, 1); 

00 = shiftjeft{zf , d); 

if {oo.l yf o.l V oo.h 7^ o.h) zf .1 \= 1; 

} 

} 

This code is used in section 47. 



ARCS = macro ( ), §2. 
curjround: int, §30. 
exceptions: int, §32. 
fpack: octa (), §31. 
ftype = enum, §36. 
funpack: ftype (), §37. 
h: tetra, §3. 

I_BIT = macro, §31. 



inf = 2, §36. 
inf. octa: octa, §4. 

1: tetra, §3. 
num = 1, §36. 
octa = struct, §3. 
ominus: octa (), §5. 
oplus : octa ( ), §5. 
RDUND_DQWN = 3, §30. 



R0UND_0FF = 1, §30. 
shiftJeft: octa {), §7. 
shift.right: octa (), §7. 
sign.bit = macro, §4. 
standard.NaN : octa, §4. 
zero.octa: octa, §4. 
zro = 0, §36. 
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50. The comparison of floating point numbers with respect to e shares some of the 
characteristics of floating point addition/subtraction. In some ways it is simpler, and 
in other ways it is more difflcult; we might as well deal with it now. 

Subroutine fepscomp {y, z, e, s) returns 2 if y, z, or e is a NaN or e is negative. It 
returns I if s = 0 and y ~ z (e) or if s ^ 0 and y ^ z (e), as defined in Section 4.2.2 
of Seminumerical Algorithms; otherwise it returns 0. 

( Subroutines 5 ) += 

int fepscomp ARGS((octa, octa, octa, int)); 
int fepscomp {y, z, e, s) 

octa y, z, e; /* the operands */ 
int s; /* test similarity? */ 

{ 

octa yf, zf, ef, o, oo; 
int ye, ze, ee; 
char ys, zs, es; 
register int yt, zt, et, d; 

et = f unpack {e,&zef ,&zee,&ces); 

if (es = return 2; 

switch (et) { 
case nan: return 2; 
case inf: ee = 10000; 
case num: case zro: break; 

} 

yt = funpack{y,kyf ,&Lye,&iys); 
zt = funpack {z,Szzf , Szze , &izs ) ; 
switch (4 * yt + zt) { 

case 4 * nan + nan: case 4 * nan + inf: case 4 * nan + num: case 4 * nan + zro: 
case 4 * inf + non: case 4 * num + nan: case 4 * zro + nan: return 2; 
case 4 * inf + inf: return [ys = zs \/ ee > 1023); 

case 4 * inf + num : case 4 * inf + zro : case 4 * num + inf : case 4 * zro + inf : 
return (s A ee > 1022); 
case 4 * zro + zro: return 1; 

case 4 * zro + num: case 4 * num + zro: if (-is) return 0; 
case 4 * num + num: break; 

} 

( Compare two numbers with respect to epsilon and return 51 ); 

} 

51. The relation y Ri z {e) reduces to y ^ z (e/2^), if d is the difference between the 
larger and smaller exponents of y and z. 

( Compare two numbers with respect to epsilon and return 51 ) = 

( Unsubnormalize y and z, if they are subnormal 52 ); 

if {ye < ze V {ye = ze A {yf .h < zf.hW {yf .h = zf .h A yf .1 < zf .1)))) 

{ Exchange y with z 48 ) ; 
if (ze = zero -exponent) ze = ye; 
d = ye — ze; 
if (-is) ee — = d; 

if (ee > 1023) return 1; /* if e > 2, z £ A^e(y) */ 
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(Compute the difference of fraction parts, o 53 ); 
if {-lO.h /\ -> 0 . 1 ) return 1 ; 

if (ee < 968) return 0; j* \i y ^ z and e < 2~^^, y ^ z 
if (ee > 1021) e/ = shiftJeft{ef , ee — 1021); 
else ef = shifUright {ef , 1021 — ee, 1); 
return o.h < ef ,h\/ {o.h = e/./i A o.l < ef .l)\ 

This code is used in section 50. 

52 . ( Unsubnormalize y and z, if they are subnormal 52 ) = 
if {ye < 0 A yt ^ zro) yf = shift Jeft{y, 2), ye = 0; 

if {ze < 0 A zt 7 ^ zro) zf = shiftJeft{z, 2), ze = 0; 

This code is used in section 51. 

53 . At this point y ^ z ii and only if 

yf + = 2^^e. 

We need to evaluate this relation without overstepping the bounds of our simulated 
64-bit registers. 

When d > 2, the difference of fraction parts might not fit exactly in an octabyte; 
in that case the numbers are not similar unless e > 3/8, and we replace the difference 
by the ceiling of the true result. When e < 1/8, our program essentially replaces 2®^e 
by [2®^eJ. These truncations are not needed simultaneously. Therefore the logic is 
justified by the facts that, if n is an integer, we have x <n\i and only if [a;] < n] n < x 
if and only if n < [a;J . (Notice that the concept of “sticky bit” is not appropriate 
here.) 

( Compute the difference of fraction parts, o 53 ) = 
if (d > 54) o = zero.octa ,00 = zf; 
else o = shifUright{zf ,d, 1), 00 = shiftJeft{o, d); 

if { 00 . h A zf .hW 00 . 1 A zf .1) { /* truncated result, hence d> 2 */ 

if (ee < 1020 ) return 0 ; /* difference is too large for similarity */ 

o = incr{o, ys = zs 1 G 1); /* adjust for ceiling */ 

} 

o = [ys = zs ? ominus{yf,o) : oplus{yf ,o))-, 

This code is used in section 51. 



ARCS = macro ( ), §2. 
funpack: ftype (), §37. 
h\ tetra, §3. 
incr\ octa ( ), §6. 
inf =2, §36. 

1: tetra, §3. 



nan = 3, §36. 
num = 1, §36. 
octa = struct, §3. 
ominus: octa (), §5. 
oplus : octa ( ), §5. 



shiftJeft: octa (), §7. 
shiftjright-. octa (), §7. 
zero.exponent = macro, §36. 
zero.octa: octa, §4. 
zro = 0, §36. 
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54. Floating point output conversion. The print-float routine converts an 
octabyte to a floating decimal representation that will be input as precisely the same 
value. 

( Subroutines 5 ) += 

static void bignurri-times-ten ARGS((bignum *)); 
static void bignurri-dec ARGS((bignum *,bignum *,tetra)); 
static int bignum-compare ARGS((bignum *,bignum *)); 
void print-float ARGS((octa)); 
void print-float (x) 
octa a:; 

{ 

( Local variables for print-float 56 ) ; 
if {x .h &L sign-bit) printf 

{ Extract the exponent e and determine the fraction interval [f . . g] oi {f . . g) 55 ) ; 

( Store / and g as multiprecise integers 63 } ; 

( Compute the significant digits s and decimal exponent e 64 ) ; 

( Print the significant digits with proper context 67 ) ; 

} 

55. One way to visualize the problem being solved here is to consider the vastly 
simpler case in which there are only 2-bit exponents and 2-bit fractions. Then the 
sixteen possible 4-bit combinations have the following interpretations: 



0000 


[0 . . 0.125] 


0001 


(0.125 


. . 0.375) 


0010 


[0.375 . 


. 0.625] 


0011 


(0.625 


. . 0.875) 


0100 


[0.875 . 


. 1.125] 


0101 


(1.125 


. . 1.375) 


0110 


[1.375 . 


. 1.625] 


0111 


(1.625 


. . 1.875) 


1000 


[1.875 . 


. 2.25] 


1001 


(2.25 .. 


2.75) 


1010 


[2.75 .. 


3.25] 


1011 


(3.25 .. 


3.75) 


1100 


[3.75 .. 


c»] 


1101 


NaN(0 


.. 0.375) 


1110 


NaN[0.375 . . 0.625] 


nil 


NaN(0.625 .. 1) 



Notice that the interval is closed, [f ■ -g\, when the fraction part is even; it is open, 
{f ■ . g), when the fraction part is odd. The printed outputs for these sixteen values, 
if we actually were dealing with such short exponents and fractions, would be 0 . , .2, 
.5, .7, 1., 1.2, 1.5, 1.7, 2., 2.5, 3., 3.5, Inf, NaN.2, NaN, NaN.8, respectively. 

( Extract the exponent e and determine the fraction interval [f .. g] or {f .. g) 55 } = 

/ = shift-left {x, 1); 
e = /./i»21; 



87 



MMIX-ARITH: FLOATING POINT OUTPUT CONVERSION 



f.h &i= 

if (-i/./i A (Handle the special case when the fraction part is zero 57) 

else { 

g = incr{f, 1); 

/ = mcr(/, -1); 

if (-le) e = 1; /* subnormal */ 

else if (e = *7ff) { 
print/ ("NaN"); 

if (p./i = *100000 A p.Z = 1) return; /* the “standard” NaN */ 
e = *3ff ; /* extreme NaNs come out OK even without adjusting f or g */ 

} else /./i 1= *200000, p./i 1= *200000; 

} 

This code is used in section 54. 

56 . (Local variables for print. float 56) = 

octa f,g; /* lower and upper bounds on the fraction part */ 
register int e; /* exponent part */ 
register int j, k\ /* all purpose indices */ 

See also section 66. 

This code is used in section 54. 

57 . The transition points between exponents correspond to powers of 2. At such 
points the interval extends only half as far to the left of that power of 2 as it does to 
the right. For example, in the 4-bit minifloat numbers considered above, case 1000 
corresponds to the interval [1.875 . . 2.25]. 

( Handle the special case when the fraction part is zero 57 ) = 



{ 

if (^e) { 



printf {"0 return; 

} 

if (e = *7ff) { 



printf {"Inf"); return; 

} 



^ 5 

f. h = *3fffff,/.l = *ffffffff; 

g. h = *400000,3.1 = 2; 

} 



e — ; 



This code is used in section 55. 



ARCS = macro ( ), §2. 
bignum = struct, §59. 
h: tetra, §3. 
mcr : octa ( ), §6. 



1: tetra, §3. 
octa = struct, §3. 
printf: int (), <stdio.h>. 
s: char [], §66. 



shiftJeft: octa {), §7. 
sign.bit = macro, §4. 

tetra = unsigned int, §3. 
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58 . We want to find the “simplest” value in the interval corresponding to the given 
number, in the sense that it has fewest significant digits when expressed in decimal 
notation. Thus, for example, if the floating point number can be described by a rela- 
tively short string such as ‘ . 1’ or ‘37el00’, we want to discover that representation. 

The basic idea is to generate the decimal representations of the two endpoints of 
the interval, outputting the leading digits where both endpoints agree, then making 
a final decision at the first place where they disagree. 

The “simplest” value is not always unique. For example, in the case of 4-bit 
minifloat numbers we could represent the bit pattern 0001 as either .2 or .3, and 
we could represent 1001 in five equally short ways: 2.3 or 2 . 4 or 2 . 5 or 2 . 6 or 2.7. 
The algorithm below tries to choose the middle possibility in such cases. 

[A solution to the analogous problem for fixed-point representations, without the 
additional complication of round-to-even, was used by the author in the program for 
TeX; see Beauty is Our Business (Springer, 1990), 233-242.] 

Suppose we are given two fractions / and g, where 0 < f < g < 1, and we want 
to compute the shortest decimal in the closed interval [f . . g]. If / = 0, we are done. 
Otherwise let 10/ = d+f' and lOg = e+g', where 0 < /' < 1 and 0 < g' < 1. If c? < e, 
we can terminate by outputting any of the digits d -I- 1 , . . . , e; otherwise we output 
the common digit d = e, and repeat the process on the fractions 0 < /' < g' < 1. A 
similar procedure works with respect to the open interval (/ . .g). 

59 . The program below carries out the stated algorithm by using multiprecision 
arithmetic on 77-place integers with 28 bits each. This choice facilitates multiplication 
by 10, and allows us to deal with the whole range of floating binary numbers using 
fixed point arithmetic. We keep track of the leading and trailing digit positions so 
that trivial operations on zeros are avoided. 

If / points to a bignum, its radix-2^® digits are f^dat[0] through /-dot [76], from 
most significant to least signihcant. We assume that all digit positions are zero unless 
they lie in the subarray between indices /-a and /-6, inclusive. Furthermore, both 
/-dot [/-a] and /-dot[/-6] are nonzero, unless f-a = f^b = bignum.prec — 1. 

The bignum data type can be used with any radix less than 2®^; we will use it 
later with radix 10®. The dat array is made large enough to accommodate both 
applications. 

^define bignum.prec 157 /* would be 77 if we cared only about prinUfloat */ 

{ Other type definitions 36 } -|-= 

typedef struct { 

int a; /* index of the most significant digit * / 

int b\ /* index of the least significant digit; must be > a */ 

tetra dat[bignum.prec\\ /* the digits; undefined except between a and h */ 

} bignum; 
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60 . Here, for example, is how we go from / to 10 /, assuming that overflow will not 
occur and that the radix is 2^®: 

( Subroutines 5 ) += 

static void bignum Jaimes J,en{f) 

bignum */; 

{ 

register tetra *q\ 
register tetra x, carry, 

for (p = k,f-‘dat[f->b],q = [/-•a], carry =0; p>q\ p — ) { 

a; = *p 10 + carry ; 

*p = X 

carry = a; 28 ; 

} 

= carry ; 

if {carry) f~*a — ; 

if \f->dat\f-h] = 0 A /-6 > /-a) f^b — ; 

} 

61 . And here is how we test whether f<g,f = g, or f>g, using any radix 
whatever: 

( Subroutines 5 ) += 

static int bignum_compare{f,g) 
bignum */, *p; 

{ 

register tetra *p, *pp, *q, *qq', 

if {f^a 7 ^ g-^a) return f^a > g-^a ? —1 : 1; 
pp = &if~dat[f^b], qq = &ig^dat[g^b]-, 

for {p = &:.f-dat[f->a],q — &ig^dat[g^a]-, p < pp’, p++,q++) { 
if {*p 7^ *q) return < *5 ? — 1 : 1; 
if {q = qq) return p < pp-, 

} 

return —1; 

} 



print.float: void (), §54. 



tetra = unsigned int, §3. 
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62 . The following subroutine subtracts g from /, assuming that f > g > 0 and 
using a given radix. 

( Subroutines 5 ) += 

static void bignum.dec{f , g,r) 
bignum *f, *g\ 
tetra r; /* the radix */ 

{ 

register tetra *p, *q, *qq\ 
register int x, borrow, 
while {g-^b > f^b) f^dat[++f^b] = 0; 
qq = &(jr*dat [g^a] ; 

for (p = &if-dat[g^h],q = k.g-dat[g->h], borrow = 0; g > qq-, p — ,q — ) { 

X = *p — *q — borrow ; 

if {x > 0) borrow — 0,*p = x; 

else borrow = l,*p = x + r\ 

} 

for ( ; borrow; p — ) 

if (*p) borrow = 0, *p = *p — 1; 
else *p = r — 1; 
while (f^dat[f-'a] = 0) { 

if (/^u = f^b) { /* the result is zero */ 

f->a = f^b — bignum.prec — 1, f-dat[bignum.prec — 1] = 0; 
return; 

} 

f^a++; 

} 

while {f^dat[f-b] = 0) f^b — ; 

} 

63 . Armed with these subroutines, we are ready to solve the problem. The first task 
is to put the numbers into bignum form. If the exponent is e, the number destined 
for digit dat[k] will consist of the rightmost 28 bits of the given fraction after it has 
been shifted right c — e — 28k bits, for some constant c. We choose c so that, when 
e has its maximum value *7ff, the leading digit will go into position dat[l], and so 
that when the number to be printed is exactly 1 the integer part of g will also be 
exactly 1. 

T^deflne magic-offset 2112 /* the constant c that makes it work */ 

T^^dehne origin 37 /* the radix point follows dat[37] */ 

( Store / and g as multiprecise integers 63 ) = 
k = {magic-offset — e)/28; 

ff.dat[k — 1] = shift.right (f , magic.offset + 28 — e — 28 * fc, l).l & ; 

gg.dat[k — 1] = shift jright{g, magic-ojjset + 28 — e — 28 * fc, l).l & ; 

ff.dat[k\ = shift.right{f , magic.offset — e — 28 * fc, l).l & f ff f f f ; 

gg.dat[k] = shift jright[g,magic-ojf set — e — 28 * fc, l).l & f ff f f f ; 
ff.dat[k + 1] = shiftJeft{f, e + 28 * fc — {magic-offset — 28)). 1 ; 

gg.dat[k + 1] = shiftJeft{g, e + 28 * fc — {magic-offset — 28)). I 
ff.a = iff.dat[k-l] ? fc - 1 : fc); 
ff.b = {ff.dat[k + l]7 k + l-.k); 
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gg.a = {gg.dat[k — 1] ? fc — 1 : fc); 
gg.b = [gg.dat[k + 1] ? fc + 1 : fc); 

This code is used in section 54. 

64. If e is sufRciently small, the fractions / and g will be less than 1, and we can use 
the stated algorithm directly. Of course, if e is extremely small, a lot of leading zeros 
need to be lopped off; in the worst case, we may have to multiply / and g by 10 more 
than 300 times. But hey, we don’t need to do that extremely often, and computers 
are pretty fast nowadays. 

In the small-exponent case, the computation always terminates before / becomes 
zero, because the interval endpoints are fractions with denominator 2* for some t > 50. 

The invariant relations ff.dat[jf.a] ^ 0 and gg .dat[gg .a\ ^ 0 are not maintained by 
the computation here, when jj.a = origin or gg.a = origin. But no harm is done, 
because bignum^compare is not used. 

( Compute the significant digits s and decimal exponent e 64 ) = 

if (e > *401) (Compute the signihcant digits in the large-exponent case 65) 
else { /* if e < *401 we have gg.a > origin and gg .dat\origin\ < 8 */ 

if {ff.a > origin) ff.dat\origin\ = 0; 

for (e = l,p = s; gg.a > origin V ff.dat[origin] = gg .dat [origin]; ) { 
if (gg.a > origin) e — ; 

else *p-l-+ = ff.dat [origin] A ’0’ ,ff.dat [origin] = 0, gg .dat[origin] = 0; 
bignumMmes-ten ; 
bignum.times-ten (&igg); 

} 

*p-l-+ = {{ff.dat [origin] -I- 1 -I- gg .dat [origin]) 1) -h ’0’ ; /* the middle digit */ 

} 

= >\Q> terminate the string s */ 

This code is used in section 54. 



a: int, §59. 
b: int, §59. 

bignum = struct, §59. 
bignum.compare: static int 
0, §61. 

bignum.prec = 157, §59. 
bignum.times.ten-. static void 



0, §60. 

dat: tetra [], §59. 
e: register int, §56. 
/: octa, §56. 
ff: bignum, §66. 
g: octa, §56. 
gg: bignum, §66. 



k: register int, §56. 

1: tetra, §3. 

p: register char *, §66. 
s: char [], §66. 
shift.left: octa (), §7. 
shift.right: octa (), §7. 
tetra = unsigned int, §3. 
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65. When e is large, we use the stated algorithm by considering / and g to be 
fractions whose denominator is a power of 10. 

An interesting case arises when the number to be converted is *44ada56a4b0835bf , 
since the interval turns out to be 

(69999999999999991611392 .. 70000000000000000000000). 

If this were a closed interval, we could simply give the answer 7e22; but the number 
7e22 actually corresponds to *44ada56a4b0835c0 because of the round-to-even rule. 
Therefore the correct answer is, say, 6 . 9999999999999995e22. This example shows 
that we need a slightly different strategy in the case of open intervals; we cannot 
simply look at the first position in which the endpoints have different decimal digits. 
Therefore we change the invariant relation to0</<5<l, when open intervals are 
involved, and we do not terminate the process when / = 0 or g = 1. 

( Compute the significant digits in the large-exponent case 65 } = 

{ register int open = x. I &cl; 
tt .dat [origin] = 10; 
tt.a = tt.b = origin-, 

for (e = 1; bignum.compare^Ezgg , ktt) > open-, e-H-) bignumj,imes-ten{&itt)-, 

p = S-, 

while (1) { 

bignum.times-ten ; 

bignum.times-ten ) ; 

for (j = ’O’; bignum.compare[&cjJ,&£tt) > 0; j++) 

bignum.dee{&Lff, &itt, *10000000), bignum.dec{^gg , &tt,* 10000000); 
if {bignum.compare{&cgg , &ctt) > open) break; 

*p++ = j; 

if (ff.a = bignum.prec — 1 A -<open) goto done; /* / = 0 in a closed interval */ 

} 

for (k = j; bignum.compare{Szgg , &!:tt) > open; k++) 
bignum.dec{Ezgg , &itt, *10000000); 

*p-H- = (j -|- 1 -I- fc) 3> 1; /* the middle digit */ 

done-. ; 

} 

This code is used in section 64. 

66. The length of string s will be at most 17. For if / and g agree to 17 places, we 

have g/f<\ + IQ-^®; but the ratio g/f is always > (1 -h 2"52 -p 2"53)/(l 2-32 _ 

2-33) > 1 + 2 X 10-13. 

( Local variables for prinUfloat 56 ) -|-= 

bignum ff, gg; /* fractions or numerators of fractions */ 
bignum tt; /* power of ten (used as the denominator) */ 
char s[18]; 
register char *p; 
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67. At this point the significant digits are in string s, and s[0] ^ ’0’ . If we put a 
decimal point at the left of s, the result should be multiplied by 10®. 

We prefer the output ‘300.’ to the form ‘3e2’, and we prefer ‘.03’ to ‘3e-2’. In 
general, the output will use an explicit exponent only if the alternative would take 
more than 18 characters. 

( Print the significant digits with proper context 67 } = 
if (e > 17 V e < (int) strlen{s) — 17) 

pnnt/ ("7, c°/,s°/, se’/.d" , s[0], (s[l] ? "." : ""),s + l,e— 1); 
else if (e < 0) print/)" . °/,0*d°/,s" , —e, 0, s); 
else if {strlen{s) > e) print/ ("7, . *s . ’/.s" , e, s, s + e); 
else print/ ("7.s7oO*d. ", s, e — (int) strZen(s), 0); 

This code is used in section 54. 



a: int, §59. 
b: int, §59. 

bignum = struct, §59. 
bignum.compare : static int 
0, §61. 

bignum.dec: static void (), 
§62. 



bignum.prec = 157, §59. 
biqnum^times^ten: static void 
0, §60. 

dat: tetra [], §59. 
e: register int, §56. 
j: register int, §56. 
k: register int, §56. 



Z: tetra, §3. 
origin = 37, §63. 
print^float: void (), §54. 
printf: int (), <stdio.h>. 
strlen: size.t (), <string.h>. 
x: octa, §54. 
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68. Floating point input conversion. Going the other way, we want to be able 
to convert a given decimal number into its floating binary equivalent. The following 
syntax is supported: 

(digit) ^0|1|2|3|4|5|6|7|8|9 
(digit string) — > (digit) | (digit string )( digit ) 

( decimal string ) — ( digit string ) . | . ( digit string ) | 

( digit string ) . ( digit string ) 

( optional sign ) — ^ ( empty ) | + | - 
( exponent ) — >■ e ( optional sign ) ( digit string ) 

( optional exponent ) — > ( empty ) | ( exponent ) 

( floating magnitude ) — ( digit string ) ( exponent ) | 

( decimal string ) ( optional exponent ) | 

Inf I NaN | NaN. (digit string) 

( floating constant ) — ( optional sign ) ( floating magnitude ) 

( decimal constant ) — ( optional sign ) ( digit string ) 

For example, ‘-3. ’ is the floating constant ’^cOOSOOOOOOOOOOOO; ‘le3’ and ‘1000’ are 
both equivalent to *408f 400000000000; ‘NaN’ and ‘+NaN.5’ are both equivalent to 
#7ff8000000000000. 

The scari-const routine looks at a given string and finds the longest initial substring 
that matches the syntax of either (decimal constant) or (floating constant). It puts 
the corresponding value into the global octabyte variable val ; it also puts the position 
of the first unscanned character in the global pointer variable next_char . It returns 1 
if a floating constant was found, 0 if a decimal constant was found, —1 if nothing was 
found. A decimal constant that doesn’t fit in an octabyte is computed modulo 2®^. 

The value of exceptions set by scan^const is not necessarily correct. 

( Subroutines 5 ) += 

static void bignum.double ARGS((bignum *)); 
int scan.const ARGS((char *)); 
int scan.const(s) 
char *s; 

{ 

( Local variables for scan^const 70 ) ; 
val.h = val.l = 0; 
p = s; 

if {*p = ’ + ’ V *p = sign = *p++\ else sign = ’ + ’ ; 

if {strncmp (p, "NaN" , 3) = 0) NaN = true,p += 3; 
else NaN = false ; 

if [[is digit (*p) A -iNaN) V [*p A isdigit[*[p + 1)))) 

( Scan a number and return 73 ) ; 
if [NaN) ( Return the standard NaN 71 ); 
if (strncmp (p, "Inf ", 3) = 0) ( Return infinity 72 ); 
no. const. found: next.char = s; return —1; 

} 

69. (Global variables 4 ) += 

octa vaT, /* value returned by scan.const */ 

char *next.char\ /* pointer returned by scan.const */ 
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70 . ( Local variables for scan.const 70 ) = 

register char *p, *q; /* for string manipulations */ 

register bool NaN\ /* are we processing a NaN? */ 
int sign-, /* ’ + ’ or */ 

See also sections 76 and 81. 

This code is used in section 68. 

71 . (Return the standard NaN 71 ) = 

{ 

next.char = p; 

val.h = *600000, exp = *3fe; 
goto packit', 

} 

This code is used in section 68. 

72 . (Return infinity 72 ) = 

{ 

nexEchar = p + 3; 
goto makeAEinfinite; 

} 

This code is used in section 68. 



ARCS = macro ( ), §2. 
bignum = struct, §59. 
bool: enum, §1. 
exceptions: int, §32. 
exp: register int, §76. 



false = 0, §1. 
h: tetra, §3. 

isdigit: int (), <ctype.h>. 
1: tetra, §3. 

make^it^infinite: label, §79. 



octa = struct, §3. 

packit: label, §78. 
strncmp: int (), <string.h>. 
true = 1, §1. 
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73 . We saw above that a string of at most 17 digits is enough to characterize a 

floating point number, for purposes of output. But a much longer buffer for dig- 
its is needed when we’re doing input. For example, consider the borderline quan- 
tity (1 -I- its decimal expansion, when written out exactly, is a num- 

ber with more than 750 significant digits: 2 . 2250738585 ... 8125e-308. If any 
one of those digits is increased, or if additional nonzero digits are added as in 
2.2250738585. . . 81250000001e-308, the rounded value is supposed to change from 
#0010000000000000 to #0010000000000001. 

We assume here that the user prefers a perfectly correct answer to a speedy almost- 
correct one, so we implement the most general case. 

( Scan a number and return 73 ) = 

{ 

for (q = bufO , dec-pt — (char *) 0; isdigit{*p); p-H-) { 
val = oplus{val, shiftjeft{val,2)); /* multiply by 5 */ 
val = incr {shift Jeft {val, 1), *p — ’O’); 
if {q > bufO V *p ^ ’O’) 

if {q < bufjmax) *q++ = *p- 

else if {*{q — 1) = ’0’ ) *{q — 1) = *p; 

} 

if {NaN) *g-H- = ’1’; 

if (*p = ’ . ’ ) ( Scan a fraction part 74 ) ; 

nexEchar = p\ 

exp = 0; 

if {*p = ’e’ A-^NaN) (Scan an exponent 77); 
if {dec.pt) ( Return a floating point constant 78 ); 
if {sign = ’-’) val = ominus{zero.octa, val); 

return 0; 

} 

This code is used in section 68. 

74 . (Scan a fraction part 74} = 

{ 

dec.pt = q; 

P++; 

for {zeros = 0; isdigit{*p); p-H-) 
if (*p = ’ 0 ’ A q = bufO ) zeros -H- ; 
else if {q < buf.max) *q++ = *p; 
else if {*{q — 1) = ’0’ ) *{q — 1) = *p; 

} 

This code is used in section 73. 

75 . The buffer needs room for eight digits of padding at the left, followed by up to 
1022-1-53 — 307 significant digits, followed by a “sticky” digit at position buf.max — 1, 
and eight more digits of padding. 

T^tdefine bufO {buf + 8) 

T^tdefine buf.max {buf + 777) 

{ Global variables 4 ) -|-= 

static char buf [785] = "00000000"; /* where we put significant input digits */ 
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76. (Local variables for scan.const 70 ) += 

register char *dec-pt; /* position of decimal point in buf */ 

register int exp; /* scanned exponent; later used for raw binary exponent */ 

register int zeros-, /* leading zeros removed after decimal point */ 

77. Here we don’t advance next-char and force a decimal point until we know that 
a syntactically correct exponent exists. 

The code here will convert extra-large inputs like ‘9e+9999999999999999’ into oo 
and extra-small inputs into zero. Strange inputs like ‘-00 . 0e9999999’ must also be 
accommodated. (But we don’t try to deliver precise answers when there are a billion 
or more leading zeros.) 

( Scan an exponent 77 ) = 

{ register char exp. sign-, 

P++-, 

if (*p = ’ + ’ V *p = exp.sign = *p++-, else exp. sign = > + >; 

if (isdigit{*p)) { 

for (exp = *p-l-+ — ’O’; isdigit{*p); p++) 

if (exp < 100000000) exp = 10 * exp + *p — ’0’ ; 
if dec.pt) dec.pt = q, zeros = 0; 
if [exp. sign = ’-’) exp = —exp-, 
next.char = p; 

} 

} 

This code is used in section 73. 

78. (Return a floating point constant 78 ) = 

{ 

( Move the digits from buf to j(f 79 ) ; 

( Determine the binary fraction and binary exponent 83 ) ; 
packit -. { Pack and round the answer 84 } ; 

return 1; 

} 

This code is used in section 73. 



ff\ bignum, §81. 
mcr : octa ( ), §6. 
isdigit: int (), <ctype.h>. 
NaN: register bool, §70. 
next.char: char *, §69. 



ominus: octa (), §5. 
oplus: octa (), §5. 
p: register char *, §70. 
q: register char *, §70. 
scan.const: int {), §68. 



shiftJeft: octa (), §7. 
sign: int, §70. 
val: octa, §69. 
zero.octa: octa, §4. 
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79. Now we get ready to compute the binary fraction bits, by putting the scanned 
input digits into a multiprecision fixed-point accumulator ff that spans the full neces- 
sary range. After this step, the number that we want to convert to floating binary will 
appear in ff.dat[ff.a], ff.dat[ff.a -f 1], . . . , ff.dat[ff.b]. The radix-10® digit in JJ[i6 - k] 
is understood to be multiplied by 10®^, for 36 > fc > —120. 

( Move the digits from buf to jff 79 ) = 

X = buf + 341 -I- zeros — dec-pt — exp ; 
if (q = bufO V X > 1413) { 
make.iEzero: exp = —99999; goto packit\ 

} 

if (s < 0) { 

make.it. infinite: exp = 99999; goto packit; 

} 

ff.a = x/9\ 

for {p= q\ p < q-\-S\ p-H-) *p = ’0’ ; /* pad with trailing zeros */ 

q — q — 1 — {q + 34:1 + zeros — dec.pt — exp) yo9\ /* compute stopping place in feu/ */ 
for (p = bufO — xVo9,k — ff.a-, p < q A k < 156; p += 9, k++) 

(Put the 9-digit number *p . . . * (p + 8) into jf. dat [fc] 80 ) ; 
ff.b = k- 1; 

for (a; = 0; p < q', p += 9) 

if [strncmp{p, "000000000" , 9) AO) x = 1; 
ff.dat\156] += x; /* nonzero digits that fall off the right are sticky */ 
while {ff.dat[ff.b] =0) ff.b — ; 

This code is used in section 78. 

80 . (Put the 9-digit number *p . . . *(pH- 8) into ff.dat[k] so) = 

{ 

for {x = *p — ’ 0’ , pp = p + 1; pp < p + 9-, pp++) x = 10 * x -I- *pp — ’O’; 
ff.dat[k] = x; 

} 

This code is used in section 79. 

81 . (Local variables for scan.const 70 ) -|-= 
register int k, x; 

register char *pp; 
bignum ff, tt-, 

82 . Here’s a subroutine that is dual to bignum.times.ten . It changes / to 2/, 
assuming that overflow will not occur and that the radix is 10®. 

( Subroutines 5 ) -|-= 

static void bignum.double{f) 

bignum */; 

{ 

register tetra *p, *q-, 
register int x, carry; 

for (p = &if-dat[f^b],q = &/-dai [/-a] , carry =0; p > q; p — ) { 

X = *p + *p + carry ; 

if (x > 1000000000) carry — 1, *p = x — 1000000000; 
else carry = 0, *p = x; 
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} 

*p = carry ; 
if {carry) f-’a — ; 

if {f-dat[f^b] =0 A f^b> f^a) f^b—, 

} 

83. (Determine the binary fraction and binary exponent 83) = 
val = zero-Octa\ 
if iff-a > 36) { 

for {exp = *3fe; jf.a > 36; exp — ) bignum_double{&ijf)', 
for {k = 54; fc; k — ) { 
if {ff.dat[36]) { 

if {k > 32) val.h |= 1 ^ (fc — 32); else val.l |= 1 <C fc; 

JJ.dat [36] = 0; 

if (jff.fe = 36) break; /* break if j(f now zero */ 

} 

bignum.double {&ijfj ; 

} 

} else { 

tt.a = tt.b = 36, tt.dat[36] = 2; 

for (eap = *3fe; bignum.compare {&ijf, &itt) > O', exp++) bignum^double{&itt); 
for {k = 54; fc; k — ) { 
bignum.double {&ijf ) ; 
if {bignum.compare {&iff, &ctt) > 0) { 

if {k > 32) val.h |= 1 ^ (fc — 32); else val.l |= 1 ^ fc; 
bignum.dec{&cff, &Ut, 1000000000); 

if {jf.a = bignum.prec — 1) break; /* break if jf now zero */ 

} 

} 

} 

if {k = 0) val.l 1=1; /* add sticky bit if nonzero */ 

This code is used in section 78. 



a: int, §59. 
b: int, §59. 

bignum = struct, §59. 
bignum.compare: static int 
0, §61. 

bignum.dec-. static void (), 
§62. 

bignum.prec = 157, §59. 
bignum.times.ten-. static void 



0, §60. 

buf-. static char [], §75. 
bufO = macro, §75. 
dat: tetra [], §59. 
dec.pt: register char *, §76. 
exp: register int, §76. 
h: tetra, §3. 

1: tetra, §3. 

p: register char §70. 



packit: label, §78. 
q: register char *, §70. 
scan.const: int (), §68. 
strncmp: int (), <string.h>. 
tetra = unsigned int, §3. 
val: octa, §69. 
zero.octa: octa, §4. 
zeros: register int, §76. 
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84. We need to be careful that the input ‘NaN . 999999999999999999999’ doesn’t 
get rounded up; it is supposed to yield *7f f f f f f f f f f f f f f f. 

Although the input ‘NaN.O’ is illegal, strictly speaking, we silently convert it to 
’^7ff 0000000000001 — a number that would be output as ‘NaN . 0000000000000002’. 
( Pack and round the answer 84 ) = 

val = fpack{val, exp, sign, R0UND_NEAR); 
if {NaN) { 

if {{val.h & *7fffffff ) = *40000000) val.h |= *7fffffff , val.l = ; 

else if {{val.h & *7fffffff ) = *3ff00000 A -ival.l) val.h \= *40000000, val.l = 1; 
else val.h |= *40000000; 

} 

This code is used in section 78. 
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85. Floating point remainders. In this section we implement the remainder of 
the floating point operations — one of which happens to be the operation of taking the 
remainder. 

The easiest task remaining is to compare two floating point quantities. Routine 
fcomp returns — 1 if y < z, 0 if 1 / = z, +1 ii y > z, and +2 if y and z are unordered. 

( Subroutines 5 ) += 

int fcomp ARGS((octa, octa)); 
int fcomp {y, z) 
octa y, z\ 

{ 

ftype yt, zt; 
int ye, ze; 
char ys, zs; 
octa yf, zf\ 
register int x\ 

yt = funpack{y,&iyf ,&iye,&iys)-, 
zt = funpack{z,&izf ,&ize,&izs); 
switch {4: * yt + zt) { 

case 4 * nan + nan : case 4 * zro + nan : case 4 * num + nan : case 4 * inf + nan : 
case 4 * nan + zro’. case 4 * nan + num-. case 4 * nan + inf: retnrn 2; 
case 4 * zro + zro-. return 0; 

case 4 * zro + num : case 4 * num + zro : case 4 * zro + inf : case 4 * inf + zro : 
case 4 * num + num : case 4 * num + inf : case 4 * inf + num : case 4 * inf + inf : 
if {ys ^ zs) X = 1; 
else if {y.h > z.h) x = T, 
else if [y.h < z.h) x = —1; 
else if {y.l > z.l) x = T, 
else if {y.l < z.l) x = —1; 
else return 0; 



break; 

} 

return {ys = ? —x : a;); 



} 



ARCS = macro ( ), §2. 



inf = 2, §36. 
1: tetra, §3. 



octa = struct, §3. 
R0UND_NEAR = 4, §30. 
sign: int, §70. 
val: octa, §69. 
zro = 0, §36. 



exp: register int, §76. 



fpack: octa (), §31. 
ftype = enum, §36. 



NaN: register bool, §70. 
nan = 3, §36. 
num = 1, §36. 



funpack: ftype (), §37. 
h: tetra, §3. 
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86. Several MMIX operations act on a single floating point number and accept an 
arbitrary rounding mode. For example, consider the operation of rounding to the 
nearest floating point integer: 

( Subroutines 5 ) += 

octa fintegerize ARGS((octa, int)); 
octa fintegerize (z,r) 

octa z; /* the operand */ 
int r; /* the rounding mode */ 

{ 

ftype zt\ 
int ze\ 
char 

octa xf , zf\ 

zt = /unpack {z,&czf ,&cze,&czs)-, 
if (-ir) r = cur.round; 
switch (zt) { 

case nan-, if (- 1 ( 2 . h & *80000)) { exceptions |= I_BIT; 2./1 |= *80000; } 

case inf-, case zro-. return 2 ; 

case num-. (Integerize and return 87); 

} 

} 

87. (Integerize and return 87) = 

if (26 > 1074) return /pacfc (2/, 26, 2S, R0UND_DFF); /* already an integer */ 
if (26 < 1020 ) xf .h = 0, xf .1 = 1; 
else { octa 00 ; 

xf = shifRright{zf , 1074 — 26, 1 ); 

00 = shiftjeft{xf , 1074 — 2e); 

if {oo.l 7 ^ zf .1 V oo.h 7 ^: zf .h) xf .1 1= 1; /* sticky bit */ 

} 

switch (r) { 

case RQUND_D0WN: if {zs = xf = incr{xf ,3)-, break; 
case RQUND_UP: if {zs 7 ^ ’-’) xf = incr{xf ,3)-, 
case RQUND_0FF: break; 

case R0UND_NEAR: xf = incr{xf , ®/.i & 4 ? 2 : 1); break; 

} 

xf .1 &= *fffffffc; 

if (26 > 1022) return f pack {shiftjeft{xf ,1074 — ze), ze, zs, ROUST) _0FF)-, 
if {xf .1) r/./i = *3ff 00000, r/./ = 0; 
if {zs = ’-’) xf .h 1 = signj)it-, 
return xf-, 

This code is used in section 86. 
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88. To convert floating point to fixed point, we use fixit. 

{ Subroutines 5 ) += 

octa fixit ARGS((octa, int)); 
octa fixit {z, r) 

octa 2; /* the operand */ 

int r; /* the rounding mode */ 

{ 

ftype zt\ 
int 2fi; 
char zs\ 
octa 2/, o; 

zt = funpack{z,&:zf ,&z,ze,&z.zs); 
if (-ir) r = curjround\ 
switch [zt) { 

case nan-, case inf: exceptions |= I_BIT; return 2; 
case zro: return zero.octa; 

case num: if { f unpack { finteg erize {z, r), zf ,&ize,&szs) = zro) return zero.octa; 
if {ze < 1076) o = shift.right {zf , 1076 — ze, 1); 
else { 

if {ze > 1085 V {ze = 1085 A {zf .h > *400000 V 

(2/. /i = *400000 A (2/ .Z V 2S 7 ^ ’“’))))) exceptions |=W_BIT; 
if {ze > 1140) return zero.octa; 
o = shift Jeft {zf , ze — 1076); 

} 

return {zs = ? ominus {zero. octa, o) : o); 

} 

} 



ARCS = macro ( ), §2. 
curjround: int, §30. 
exceptions-, int, §32. 
fpack: octa (), §31. 
ftype = enum, §36. 
/unpack: ftype {), §37. 
h: tetra, §3. 

I_BIT = macro, §31. 
incr: octa ( ), §6. 



inf = 2, §36. 

1: tetra, §3. 
nan = 3, §36. 
num = 1, §36. 
octa = struct, §3. 
ominus: octa (), §5. 
R0UND_DQWN = 3, §30. 
RDUND_NEAR = 4, §30. 



R0UND_0FF = 1, §30. 
R0UND_UP =2, §30. 
shiftjeft: octa (), §7. 
shiftjright: octa (), §7. 
sign.bit = macro, §4. 
W_BIT = macro, §31. 
zero.octa: octa, §4. 
zro = 0, §36. 
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89. Going the other way, we can specify not only a rounding mode but whether the 
given fixed point octabyte is signed or unsigned, and whether the result should be 
rounded to short precision. 

( Subroutines 5 ) += 

octa floatit ARGS((octa, int, int, int)); 
octa floatit (z,r,u,p) 

octa Z-, /* octabyte to float */ 

int r; /* rounding mode */ 

int u; /* unsigned? */ 

int p; /* short precision? */ 

{ 

int e; char s; 
register int t; 

exceptions = 0; 

if (-•z.h A ~'Z.l) return zero.octa; 
if (-ir) r = cur.round; 

if {-'U A {z.h &L sign.bit)) s = ’ , z = ominus{zero-octa, z); else s=’ + ’; 
e = 1076; 

while {z.h < *400000) e — , « = shiftJeft{z, 1); 
while {z.h > *800000) { 
e++ ; 

t = z.l k, 1; 

2 = shift.right{z, 1, 1); 
z.l 1= t- 

} 

if (p) ( Convert to short float 90 ) ; 
return fpack{z, e, s,r); 

} 

90. (Convert to short float 90 ) = 

{ 

register int ex-, register tetra t; 

t = sfpack {z, e, s, r); 
ex = exceptions-, 
sfunpack {t, kz, ke, ks)-, 
exceptions = ex-, 

} 

This code is used in section 89. 

91. The square root operation is more interesting. 

( Subroutines 5 ) += 

octa froot ARCS ((octa, int)); 
octa froot {z, r) 

octa z; /* the operand */ 
int r; /* the rounding mode */ 



{ 



ftype zt; 
int ze; 
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char zs\ 

octa X, xf, rf, zf\ 
register int xe, k\ 
if {-'r) r = curjround\ 
zt = funpack{z,kzf ,&z,ze,&z.zs)’, 

if {zs = A zt ^ zro) exceptions |= I_BIT,a; = standard.NaN ; 

else switch (zt) { 

case nan: if (-i(z.h Sz* 80000)) exceptions |= ft |= *80000; 

return z; 

case inf: case zro: x = z\ break; 

case num: (Take the square root and return 92 ); 

} 

if (zs = ’-’) x.h \= signj)it\ 

return x\ 

} 

92. The square root can be found by an adaptation of the old pencil- and-paper 
method. If n = : where s is an integer, we have s = v? + r where 0 < r < 2n; 

this invariant can be maintained if we replace s by 4s-|- (0, 1, 2, 3) and n by 2n-|- (0, 1). 
The following code implements this idea with 2n in xf and r hi rf . (It could easily 
be made to run about twice as fast.) 

( Take the square root and return 92 ) = 
xf .h = 0, xf . I = 2; 
xe = (ze + *3fe) S> 1; 
if (ze & 1) zf = shiftJeft(zf , 1); 
rf .h = 0, rf .1 = (zf .h » 22) - 1; 
for (k — 53; k\ k — ) { 

rf = shiftJeft(rf ,2)\ xf = shiftJeft(xf , 1); 
if (k > 43) rf = incr(rf, (zf .h » (2 * (fc - 43))) & 3); 
else if (fc > 27) rf = incr(rf , (zf .1 ^ (2* (k — 27))) & 3); 
if ((rf .1 > xf .1 A rf .h > xf .h) \/ rf ,h> xf .h) { 
xf.l++; rf = ominus(rf , xf)', xf .1++; 

} 

} 

if [rf.hVrf.l) xf /*■ sticky bit */ 

return fpack{xf , xe, ’ + ^ ,r); 

This code is used in section 91. 



ARCS = macro ( ), §2. 
curjround: int, §30. 
exceptions: int, §32. 
fpack: octa (), §31. 
ftype = enum, §36. 
funpack: ftype (), §37. 
h: tetra, §3. 

I_BIT = macro, §31. 



incr: octa ( ), §6. 
inf = 2, §36. 

1: tetra, §3. 
nan = 3, §36. 
num = 1, §36. 
octa = struct, §3. 
ominus: octa (), §5. 
sfpack: tetra {), §34. 



sfunpack: ftype (), §38. 
shiftJeft: octa {), §7. 
shiftjright: octa (), §7. 
sign.bit = macro, §4. 
standard.N aN : octa, §4. 
tetra = unsigned int, §3. 
zero.octa: octa, §4. 
zro = 0, §36. 
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93. And finally, the genuine floating point remainder. Subroutine fremstep either 
calculates y rem 2 ; or reduces y to a smaller number having the same remainder with 
respect to z. In the latter case the E_BIT is set in exceptions. A third parameter, 
delta, gives a decrease in exponent that is acceptable for incomplete results; if delta 
is sufficiently large, say 2500, the correct result will always be obtained in one step of 
fremstep . 

{ Subroutines 5 ) += 

octa fremstep ARGS((octa, octa, int)); 
octa fremstep{y, z, delta) 
octa y, z\ 
int delta-, 

{ 

ftype yt, zt; 

int ye, ze; 

char xs, ys, zs; 

octa X, xf, yf , zf; 

register int xe, thresh, odd-, 

yt = funpack{y,Ezyf ,k.ye,k.ys)-, 
zt = funpack (z, &z/ , &ze , &zs ) ; 
switch (4 * j/t + zt) { 

(The usual NaN cases 42); 

case 4 * zro + zro : case 4 * num + zro : case 4 * inf + zro : case 4 * inf + num : 
case 4 * inf + inf : x = standard.NaN ; 
exceptions \— I_BIT; break; 

case 4 * zro + num- case 4 * zro + inf-, case 4 * num + inf: return y; 
case 4* num + num: ( Remainderize nonzero numbers and return 94 ); 
zero -Out: x = zero-octa; 

} 

if {ys = x.h 1= sign-bit-, 

return x-, 

} 

94. If there’s a huge difference in exponents and the remainder is nonzero, this 
computation will take a long time. One could compute (2"?/) remz much more quickly 
for large n by using O(logn) multiplications modulo z, but the floating remainder 
operation isn’t important enough to justify such expensive hardware. 

Results of floating remainder are always exact, so the rounding mode is immaterial. 
( Remainderize nonzero numbers and return 94 ) = 

odd =0; /* will be 1 if we’ve subtracted an odd multiple of 2 from y */ 

thresh — ye — delta-, 
if [thresh < ze) thresh = ze; 

while [ye > thresh) (Reduce [ye,yf) by a multiple of zf; goto zero-Out if the 
remainder is zero, goto try -complement if appropriate 95); 
if (i/e > ze) { 

exceptions |=E_BIT; return /pacA:(j//, ye, ys, R0UND_0FF); 

} 

if (ye < ze — 1) return fpack[yf , ye, ys, RDUND_0FF); 
yf = shift-right[yf ,1,1); 
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try ^complement', xf = ominus {zf , yf ) , xe = ze,xs = ’ + ’ + — ys; 

if {xf.h > yf.h V {xf .h = yf .h A {xf .1 > yf .1 V {xf .1 = yf .1 A ~^odd)))) xf = yf,xs = ys', 
while {xf.h < *400000) xe — ,xf = shiftJeft{xf , 1); 
return fpack {xf ,xe,xs, R0UND_0FF ) ; 

This code is used in section 93. 

95. Here we are careful not to change the sign of y, because a remainder of 0 is 
supposed to inherit the original sign of y. 

(Reduce {ye,yf) by a multiple of zf', goto zero. out if the remainder is zero, goto 
try. complement if appropriate 95 } = 

{ 

if {yf .h = zf .h A yf .1 = zf .1) goto zero.out; 
if {yf.h < zf.hy {yf.h = zf .h A yf .1 < zf .1)) { 
if {ye = ze) goto try. complement', 
ye — ,yf = shift.left{yf , 1); 

} 

yf = ommus{yf , zf); 
if {ye = ze) odd = 1; 

while {yf.h < *400000) ye — , yf = shift.left{yf , 1); 

} 

This code is used in section 94. 



ARCS = macro ( ), §2. 
E_BIT = macro, §31. 
exceptions: int, §32. 
fpack: octa (), §31. 
ftype = enum, §36. 
funpack: ftype (), §37. 
h: tetra, §3. 



I_BIT=macro, §31. 
inf = 2, §36. 

1: tetra, §3. 
num = 1, §36. 
octa = struct, §3. 
ominus: octa (), §5. 
RDUND_0FF = 1, §30. 



shiftJeft: octa (), §7. 
shiftjright: octa (), §7. 
sign.bit = macro, §4. 
standard.N aN : octa, §4. 
zero.octa: octa, §4. 
zro = 0, §36. 
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96. Names of the sections. 

( Add nonzero numbers and return 47) Used in section 46. 

( Adjust for difference in exponents 49 ) Used in section 47. 

( Check that x < z; otherwise give trivial answer 14) Used in section 13. 

( Compare two numbers with respect to epsilon and return 51 ) Used in section 50. 
( Compute the difference of fraction parts, o 53 ) Used in section 51. 

( Compute the significant digits in the large-exponent case 65 ) Used in section 64. 

( Compute the significant digits s and decimal exponent e 64 ) Used in section 54. 

( Convert to short float 90 ) Used in section 89. 

( Determine the binary fraction and binary exponent 83 ) Used in section 78. 

( Determine the number of significant places n in the divisor t; 16 ) Used in section 13. 
(Determine the quotient digit q[j] 20) Used in section 13. 

( Divide nonzero numbers and return 45 ) Used in section 44. 

( Exchange y with Z as) Used in sections 47 and 51. 

( Extract the exponent e and determine the fraction interval [f . . g] or [f . . g) 55 ) 
Used in section 54. 

( Find the trial quotient, q 21 ) Used in section 20. 

( Global variables 4. 9, 30, 32. 69. 75 ) Used in section 1. 

( Handle the special case when the fraction part is zero 57) Used in section 55. 

( If the result was negative, decrease g by 1 23 ) Used in section 20. 

( Integerize and return 87 ) Used in section 86. 

( Local variables for print-float 56, 66 ) Used in section 54. 

( Local variables for scan-const 70, 76, 8i ) Used in section 68. 

( Move the digits from buf to Jf 79 ) Used in section 78. 

( Multiply nonzero numbers and return 43 ) Used in section 41. 

( Normalize the divisor 17) Used in section 13. 

( Other type definitions 36, 59 ) Used in section 1. 

( Pack and round the answer 84 ) Used in section 78. 

( Pack q and U to acc and aux 19 ) Used in section 13. 

( Pack w into the outputs aux and acc 11 ) Used in section 8. 

( Print the significant digits with proper context 67 ) Used in section 54. 

( Put the 9-digit number *p . . . +(p -|- 8) into ff.dat [k] so ) Used in section 79. 
(Reduce {ye,yf) by a multiple of zf\ goto zero-out if the remainder is zero, goto 
try -Complement if appropriate 95 ) Used in section 94. 

( Remainderize nonzero numbers and return 94 ) Used in section 93. 

( Return a floating point constant 78 ) Used in section 73. 

( Return infinity 72 ) Used in section 68. 

( Return the standard NaN 71 ) Used in section 68. 

( Round and return the result 33 ) Used in section 31. 

( Round and return the short result 35 ) Used in section 34. 

( Scan a fraction part 74) Used in section 73. 

( Scan a number and return 73 ) Used in section 68. 

( Scan an exponent 77 ) Used in section 73. 

( Store / and g as multiprecise integers 63 ) Used in section 54. 

( Stuff for C preprocessor 2 ) Used in section 1. 
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(Subroutines 5, 6, 7, 8, 12, 13, 24, 25, 26, 27, 28, 29, 31, 34, 37, 38, 39, 40, 41, 44, 46, 50, 54, 60, 

61, 62, 68, 82, 85, 86, 88, 89, 91, 93) Used in section 1. 

( Subtract V qv from U 22) Used in section 20. 

( Take the square root and return 92 ) Used in section 91. 

( Tetrabyte and octabyte type definitions 3) Used in section 1. 

( The usual NaN cases 42 ) Used in sections 41, 44, 46, and 93. 

( Unnormalize the remainder 18 ) Used in section 13. 

( Unpack the dividend and divisor to u and u 15 ) Used in section 13. 

( Unpack the multiplier and multiplicand to u and u 10 ) Used in section 8. 

( Unsubnormalize y and 2:, if they are subnormal 52) Used in section 51. 



MMIX-CONFIG 

1. Input format. Configuration files allow this simulator to adapt itself to in- 
finitely many possible combinations of hardware features. The purpose of the present 
module is to read a configuration file, check it for validity, and set up the relevant 
data structures. 

All data in a configuration file consists simply of tokens separated by one or more 
units of white space, where a “token” is any sequence of nonspace characters that 
doesn’t contain a percent sign. Percent signs and anything following them on a line 
are ignored; this convention allows a user to include comments in the file. Here’s a 
simple (but weird) example: 

"/, Silly configuration 
writebuffer 200 
memaddresstime 100 
Dcache associativity 4 Iru 
Dcache blocksize 1024 

unit ODD 5555555555555555555555555555555555555555555555555555555555555555 
unit EVEN aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 
div 40 30 20 "/, three-stage divide 

It means that (1) the write buffer has capacity for 200 octabytes; (2) the memory bus 
takes 100 cycles to process an address; (3) there’s a D-cache, in which each set has 
4 blocks and the replacement policy is least-recently-used; (4) each block in the D- 
cache has 1024 bytes; (5) there are two functional units, one for all the odd-numbered 
opcodes and one for all the rest; (6) the division instructions take three pipeline stages, 
spending 40 cycles in the first stage, 30 in the second, and 20 in the last; (7) all other 
parameters have default values. 

2. Four kinds of specifications can appear in a configuration file, according to the 
following syntax: 

( specification ) — >• ( PV spec ) | ( cache spec ) | ( pipe spec ) | ( functional spec ) 
(PV spec) — >■ ( parameter )( decimal value) 

( cache spec ) — ( cache name ) ( cache parameter ) ( decimal value ) ( policy ) 

( pipe spec ) — > ( operation ) ( pipeline times ) 

( functional spec ) — !• unit ( name ) ( 64 hexadecimal digits ) 

3. A ( PV spec ) simply assigns a given value to a given parameter. The possibilities 
for ( parameter ) are as follows: 

• f etchbuff er (default 4), maximum instructions in the fetch buffer; must be > 1. 

• writebuffer (default 2), maximum octabytes in the write buffer; must be > 1. 

• reorderbuf f er (default 5), maximum instructions issued but not committed; must 
be > 1. 

• renameregs (default 5), maximum partial results in the reorder buffer; must be 
> 1 . 
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• memslots (default 2), maximum store instructions in the reorder buffer; must be 

> 1 - 

• localregs (default 256), number of local registers in ring; must be 256, 512, or 
1024. 

• fetchmax (default 2), maximum instructions fetched per cycle; must be > 1. 

• dispatchmax (default 1), maximum instructions issued per cycle; must be > 1. 

• peekahead (default 1), maximum lookahead for jumps per cycle. 

• commitmax (default 1), maximum instructions committed per cycle; must be > 1. 

• fremmax (default 1), maximum reductions in FREM computation per cycle; must be 
> 1 . 

• denin (default 1), extra cycles taken if a floating point input is subnormal. 

• denout (default 1), extra cycles taken if a floating point result is subnormal. 

• writeholdingtime (default 0), minimum number of cycles for data to remain in 
the write buffer. 

• memaddresstime (default 20), cycles to process memory address; must be > 1. 

• memreadtime (default 20), cycles to read one memory busload; must be > 1. 

• memwritetime (default 20), cycles to write one memory busload; must be > 1. 

• membusbytes (default 8), number of bytes per memory busload; must be a power 
of 2 that is 8 or more. 

• branchpredictbits (default 0), number of bits in each branch prediction table 
entry; must be < 8. 

• branchaddressbits (default 0), number of bits in instruction address used to 
index the branch prediction table. 

• branchhistorybits (default 0), number of bits in branch history used to index 
the branch prediction table. 

• branchdualbits (default 0), number of bits of instruction-address-xor-branch- 
history used to index the branch prediction table. 

• hardwarepagetable (default 1), is zero if page table calculations must be emulated 
by the operating system. 

• disablesecurity (default 0), is 1 if the hot-seat security checks are turned off. 
This option is used only for testing purposes; it means that the ‘s’ interrupt will not 
occur, and the ‘p’ interrupt will be signaled only when going from a nonnegative 
location to a negative one. 

• memchunksmax (default 1000), maximum number of 2^®-byte chunks of simulated 
memory; must be > 1. 

• hashprime (default 2003), prime number used to address simulated memory; must 
exceed memchunksmax, preferably by a factor of about 2. 

The values of memchunksmax and hashprime affect only the speed of the simulator, 

not its results — unless a very huge program is being simulated. The stated defaults 

for memchunksmax and hashprime should be adequate for almost all applications. 
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4. A ( cache spec ) assigns a given value to a parameter affecting one of five possible 
caches: 

( cache spec ) — >■ ( cache name ) ( cache parameter ) ( decimal value ) ( policy ) 
(cache name) — >■ ITcache | DTcache | Icache | Dcache | Scache 
(policy) — > (empty) | random | serial | pseudolru | Iru 

The possibilities for ( cache parameter ) are as follows: 

• associativity (default 1), number of cache blocks per cache set; must be a power 
of 2. (A cache with associativity 1 is said to be “direct-mapped.”) 

• blocksize (default 8), number of bytes per cache block; must be a power of 2, at 
least equal to the granularity, and at most equal to 8192. The blocksize of ITcache 
and DTcache must be 8. 

• setsize (default 1), number of sets of cache blocks; must be a power of 2. (A 
cache with set size 1 is said to be “fully associative.”) 

• granularity (default 8), number of bytes per “dirty bit,” used to remember which 
items of data have changed since they were read from memory; must be a power 
of 2 and at least 8. The granularity must be 8 if writeallocate is 0. 

• victimsize (default 0), number of cache blocks in the victim buffer, which holds 
blocks removed from the main cache sets; must be zero or a power of 2. 

• writeback (default 0), is 1 in a “write-back” cache, which holds dirty data as 
long as possible; is 0 in a “write-through” cache, which cleans all data as soon as 
possible. 

• writeallocate (default 0), is 1 in a “write-allocate” cache, which remembers all 
recently written data; is 0 in a “write-around” cache, which doesn’t make space for 
newly written data that fails to hit an existing cache block. 

• accesstime (default 1), number of cycles to query the cache; must be > 1. (Hits 
in the S-cache actually require twice the accesstime, once to query the tag and once 
to transmit the data.) 

• copyintime (default 1), number of cycles to move a cache block from its input 
buffer into the cache proper; must be > 1. 

• copyouttime (default 1), number of cycles to move a cache block from the cache 
proper to its output buffer; must be > 1. 

• ports (default 1), number of processes that can simultaneous query the cache; 
must be > 1. 

The ( policy ) parameter should be nonempty only on cache specifications for param- 
eters associativity and victimsize. If no replacement policy is specified, random is 
the default. All four policies are equivalent when the associativity or victimsize 
is 1; pseudolru is equivalent to Iru when the associativity or victimsize is 2. 

The granularity, writeback, writeallocate, and copyouttime parameters af- 
fect the performance only of the D-cache and S-cache; the other three caches are 
read-only, so they never need to write their data. 
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The ports parameter affects the performance of the D-cache and DT-cache, and 
(if the PREGO command is used) the performance of the I-cache and IT-cache. The S- 
cache accommodates only one process at a time, regardless of the number of specified 
ports. 

Only the translation caches (the IT-cache and DT-cache) are present by default. 
But if any specihcations are given for, say, an I-cache, all of the unspecified Tcache 
parameters take their default values. 

The existence of an S-cache (secondary cache) implies the existence of both I- 
cache and D-cache (primary caches for instructions and data). The block size of the 
secondary cache must not be less than the block size of the primary caches. The 
secondary cache must have the same granularity as the D-cache. 

5. A (pipe spec) governs the execution time of potentially slow operations. 

( pipe spec ) — > ( operation ) ( pipeline times ) 

( pipeline times ) — > ( decimal value ) | ( pipeline times ) ( decimal value ) 

Here the ( operation ) is one of the following: 

• mulO through mul8 (default 10); the values for mulj refer to products in which the 
second operand is less than 2®-l, where j is as small as possible. Thus, for example, 
mull applies to nonzero one-byte multipliers. 

• div (default 60); this applies to integer division, signed and unsigned. 

• sh (default 1); this applies to left and right shifts, signed and unsigned. 

• mux (default 1); the multiplex operator. 

• sadd (default 1); the sideways addition operator. 

• mor (default 1); the boolean matrix multiplication operators MOR and MXOR. 

• fadd (default 4); floating point addition and subtraction. 

• fmul (default 4); floating point multiplication. 

• fdiv (default 40); floating point division. 

• fsqrt (default 40); floating point square root. 

• fint (default 4); floating point integerization. 

• fix (default 2); conversion from floating to fixed, signed and unsigned. 

• flot (default 2); conversion from hxed to floating, signed and unsigned. 

• f eps (default 4) ; floating comparison with respect to epsilon. 

In each case one can specify a sequence of pipeline stages, with a positive number of 
cycles to be spent in each stage. For example, a specification like ‘fmul 3 1’ would 
say that a functional unit that supports FMUL takes a total of four cycles to compute 
the floating point product in two stages; it can start working on a second product 
after three cycles have gone by. 

If a floating point operation has a subnormal input, denin is added to the time for 
the first stage. If a floating point operation has a subnormal result, denout is added 
to the time for the last stage. 
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6. The fourth and final kind of specification defines a functional unit: 

( functional spec ) — unit ( name ) ( 64 hexadecimal digits ) 

The symbolic name should be at most fifteen characters long. The 64 hexadecimal 
digits contain 256 bits, with T’ for each supported opcode; the most significant 
(leftmost) bit is for opcode 0 (TRAP), and the least significant bit is for opcode 255 
(TRIP). 

For example, we can define a load/store unit (which handles register/memory oper- 
ations), a multiplication unit (which handles fixed and floating point multiplication), a 
boolean unit (which handles only bitwise operations), and a more general arithmetic- 
logical unit, as follows: 

unit LSU OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOfffffffcfffffffcOOOOOOOOOOOOOOOO 
unit MUL OOOOSOf 000000000000000000000000000000000000000000000000000000000 
unit BIT OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOffffOOffOOffOOOO 
unit ALU f0000000ffffffffffffffffffffffff0000000300000003ffffffffffffffff 

The order in which units are specified is important, because MMIX’s dispatcher 
will try to match each instruction with the first functional unit that supports its 
opcode. Therefore it is best to list more specialized units (like the BIT unit in this 
example) before more general ones; this lets the specialized units have first chance at 
the instructions they can handle. 

There can be any number of functional units, having possibly identical specifica- 
tions. One should, however, give each unit a unique name (e.g., ALUl and ALU2 if there 
are two arithmetic-logical units), since these names are used in diagnostic messages. 
Opcodes that aren’t supported by any specified unit will cause an emulation trap. 

7. Full details about the significance of all these parameters can be found in the 
mmix-pipe module, which defines and discusses the data structures that need to be 
configured and initialized. 

Of course the specifications in a configuration file needn’t make any sense, nor need 
they be practically achievable. We could, for example, specify a unit that handles 
only the two opcodes NXOR and DIVUI; we could specify 1-cycle division but pipelined 
100-cycle shifts, or 1-cycle memory access but 100-cycle cache access. We could create 
a thousand rename registers and issue a hundred instructions per cycle, etc. Some 
combinations of parameters are clearly ridiculous. 

But there remain a huge number of possibilities of interest, especially as technology 
continues to evolve. By experimenting with configurations that are extreme by 
present-day standards, we can see how much might be gained if the corresponding 
hardware could be built economically. 
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8. Basic input/output. Let’s get ready to program the MMIX.config subrou- 
tine by building some simple infrastructure. First we need some macros to print error 
messages. 

T^tdeflne errprintO {f) fprintf (stderr , f) 

:^deflne errprintl (/, a) fprintf {stderr , f, a) 

T^tdeflne errprint2{f,a,b) fprintf {stderr , f , a, b) 

^define errprintS {f,a,b,c) fprintf {stderr , f,a,b,c) 

T^define panic{x) { x\ errprintO {" Wn"); exit {—I); } 

9. And we need a place to look at the input. 

T^tdeflne BUF_SIZE 100 /* we don’t need long lines */ 

{ Global variables 9 ) = 

FILE *config.file\ /* input conies from here */ 

char buffer [BVF_SIZE]-, /* input lines go here */ 

char tofcen [BUF_SIZE]; /* and tokens are copied to here */ 

char *buf_pointer = buffer; /* this is our current position */ 

bool token.prescanned; /* does token contain the next token already? */ 

See also sections 15 and 28. 

This code is used in section 38. 



bool = enum, MMIX-PIPE §11. FILE, <stdio.h>. 

exit: void (), <stdllb.h>. fprintf: int {), <stdio.h>. 



MMIX.config: void (), §38. 
stderr: FILE *, <stdio.h>. 
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10 . The get_token routine copies the next token of input into the token buffer. After 
the input has ended, a final ‘end’ is appended. 

( Subroutines 10 ) = 

static void get.token ARGS((void)); 

static void get.token () /* set token to the next token of the configuration file * / 

{ 

register char *p, *q; 

if {token.prescanned) { 

token^prescanned = false; return; 

} 

while (1) { /* scan past white space */ 

if {*bufjpointer = ’\0’ V *bufjpointer = ’\n’ V *buf -pointer { 

if {-if gets (buffer ,BUF -SIZE, eonfig-file)) { 
strcpy (token, "end"); return; 

} 

if (strlen (buffer) = BUF_SIZE - 1 A fcuifer [BUF_SIZE - 2] ’\n’ ) 

panic (errprintl ("conf iguf ileulineutooulong: u‘°/oS . . . buffer)); 
buf-pointer = buffer; 

} else if (-iisspaee(*buf -pointer)) break; 
else buf-pointer++; 

} 

for (p = buf -pointer , q = token; ~'isspace(*p) A *p p++,q++) *q = *p; 

buf-pointer = p; *q — ’\Q’ ; 

return; 

} 

See also sections 11, 16, 22, 23, 30, and 31. 

This code is used in section 38. 

11. The get-int routine is called when we wish to input a decimal value. It returns 
— 1 if the next token isn’t a string of decimal digits. 

( Subroutines 10 } += 

static int get-int ARCS ((void)); 
static int get-int ( ) 

{ int v; 
char *p; 
get-token ( ); 

for (p = token, V = 0; *p > ’0’ A *p < ’9’ ; p++) u = 10*w + *p— ’O’; 
if (*p) return —1; 
return v; 

} 
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12. A simple data structure makes it fairly easy to deal with parameter/ value 
specifications. 

( Type definitions 12 ) = 

typedef struct { 

char name [ 20 ]; /* symbolic name */ 

int *n; /* internal name */ 

int defval; /* default value */ 

int minval, maxval\ /* minimum and maximum legal values */ 
bool power. of -two-, /* must it be a power of two? */ 

} pv_spec; 

See also sections 13 and 14. 

This code is used in section 38. 

13 . Cache parameters are a bit more difficult, but still not bad. 

( Type definitions 12 ) += 

typedef enum { 

assoc , blksz , setsz , gran , vctsz , wrb , wra , acctm , citm , cotm , prts 

} c_param; 
typedef struct { 

char name [ 20 ]; /* symbolic name */ 

c_param v, /* internal code */ 
int defval; /* default value */ 

int minval, maxval; /* minimum and maximum legal values */ 
bool power-of-two ; /* must it be a power of two? */ 

} cpv_spec; 

14 . Operation codes are the easiest of all. 

( Type definitions 12 ) += 

typedef struct { 

char name [ 8 ]; /* symbolic name */ 

internal.opcode v; /* internal code */ 
int defval; /* default value */ 

} op_spec; 



ARCS = macro ( ), MMIX-PIPE §6. 
bool = enum, MMix-PiPE §11. 
buf -pointer char *, §9. 
BUF.SIZE = 100, §9. 
buffer-, char [], §9. 
config-file-. FILE *, §9. 



errprintl = macro (), §8. 
false =0, MMIX-PIPE §11. 
fgets: char *(), <stdlo.h>. 
internahopcode = enum, 
MMIX-PIPE §49. 
isspace: int (), <ctype.h>. 



panic = macro (), §8. 
strcpy-. char *(), <strlng.h>. 
strlen: size_t (), <string.h>. 
token: char [], §9. 
token-prescanned: bool, §9. 
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15. Most of the parameters are external variables that are declared in the header 
file mmix-pipe . h; but some are private to this module. Here we define the main tables 
used below. 

( Global variables 9 ) += 

int fetch.bufsize, write.buf.size , reorder. buf size , mem.bus.bytes , hardware.PT ■, 
int max.cycs = 60; 
pv.spec PV[] = { 

{"f etchbuff er" , Szfetch.bufsize, 4, 1, INT_MAX, /a/se}, 

{"writebuff er" , &iwrite.bufsize , 2, 1, INT_MAX, /alse}, 

{"reorderbuffer" , &ireorder.bufsize, 5, 1, INT_MAX,/a/se}, 

{"renameregs" , &imax.rename.regs , 5, 1, INT_MAX, /alse}, 

{"memslots" , &imax.memslots , 2, 1, INT_MAX, /aZse}, 

{"localregs" , Szlringsize, 256, 256, 1024, true}, 

{"f etchmax" , &zfetch.max , 2, 1, INT_MAX, /aZse}, 

{"dispatchmax" , Szdispatch.max , 1, 1, INT_MAX , /aZse } , 

{"peekahead" , &cpeekahead , 1, 0, INT_MAX, /aZse}, 

{"commitmax" , Szcommit.max , 1, 1, INT_MAX, /aZse}, 

{"fremmax" , Szfrem.max , 1, 1, INT_MAX, /aZse}, 

{"denin" , &cdenin.penalty , 1, 0, INT_MAX, /aZse}, 

{"denout" , denout. penalty , 1, 0, INT_MAX, /aZse}, 

{"writeholdingtime" , &cholding.time , 0, 0, INT_MAX, /aZse}, 

{"memaddresstime" , k.mem.addr.time , 20, 1, INT_MAX, /aZse}, 

{"memreadtime" , &imem.read.time ,20 , 1, INT_MAX, /aZse}, 

{"memwritetime" , Szmemjwrite.time , 20, 1, INT_MAX , /aZse } , 

{"membusbytes" , &zmem.bus.bytes , 8, 8, INT_MAX, true}, 

{"branchpredictbits" , &zbp.n,0, 0, 8, false}, 

{"branchaddressbits" , Szbp.a,0, 0, 32, false}, 

{"branchhistorybits" , Szbp.b,0, 0, 32,/aZse}, 

{"branchdualbits" , &ibp.c, 0, 0, 32, false}, 

{"hardwarepagetable" ,&Lhardware.PT , 1,0, 1,/aZse}, 

{"disablesecurity" , (int *) &isecurity.disabled, 0,0,1, false}, 

{"memchunksmax" , &cmem.chunks.max , 1000, 1, INT_MAX, /aZse}, 

{"hashprime" , &Lhashjprime , 2003, 2, INT_MAX, /aZse}}; 

cpv_spec GPy[] = {{"associativity" , assoc, 1, 1, INT_MAX, Zrtie}, 

{ "blocksize" , blksz,8, 8, 8192, Zrtie}, 

{"set size" , setsz ,1,1, INT_MAX, Zrtie}, 

{"granularity" , gran, 8, 8, 8192, Zrtie}, 

{"victimsize" , vctsz , 0, 0, INT_MAX, true}, 

{"writeback" , wrb, 0, 0, 1,/oZse}, 

{"writeallocate" , wra, 0, 0, 1,/aZse}, 

{"accesstime" , acctm, 1, 1, INT_MAX, /aZse}, 

{"copyintime" , citm, 1, 1, INT_MAX, /aZse}, 

{"copyouttime" , cotm, 1, 1, INT_MAX, /aZse}, 

{"ports" , prts, 1, 1, INT_MAX, /aZse}}; 

op_spec OP{\ = {{"mulO" , mulO ,10}, {"mull" , mull ,10}, {"mul2" , mul2 ,10}, {"mul3" , 
mul3 , 10}, {"mul4" , mulf , 10}, {"mul5" , mul5 , 10}, {"mul6" , mul6 , 10}, {"mul7" , 
mul7, 10}, {"mul8" , mul8 , 10}, 

{ "div" , div, 60}, { "sh" , sZi, 1}, {"mux" , mux, 1}, {"sadd" , sadd, 1}, {"mor" , mor, 1}, 
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{"fadd" ,/aiid, 4}, {"fmul" 4}, {"fdiv" ,fdiv, 40}, {"f sqrt" ,fsqrt, 40}, 

{"f int" 4}, 

{"iix" , fix, 2}, {"flot" , flat, 2], {"feps" ,feps, 4}}- 
int PV-size, CPVsize, OPsize\ /* the number of entries in PV , CPV , OP */ 



acctm = 7, §13. 

assoc = 0, §13. 

blksz = 1, §13. 

bp. a: int, MMIX-PIPE §150. 

bp.b-. int, MMIX-PIPE §150. 

bp.c: int, MMIX-PIPE §150. 

bp.n: int, MMIX-PIPE §150. 

citm = 8, §13. 

commit.max: int, 

MMIX-PIPE §59. 
cotm = 9, §13. 
cpv_spec = struct, §13. 
denin.penalty : int , 

MMIX-PIPE §349. 
denout.penalty : int , 

MMIX-PIPE §349. 
dispatch.max : int , 

MMIX-PIPE §59. 
div =9, MMIX-PIPE §49. 
fadd = 14, MMIX-PIPE §49. 
false =0, MMIX-PIPE§11. 
fdiv = 16, MMIX-PIPE §49. 
/eps=21, mmix-pipe§49. 
fetch.max: int, MMIX-PIPE §59. 
fint = 18, MMIX-PIPE §49. 



fix = 19, MMIX-PIPE §49. 
flat =20, MMIX-PIPE §49. 
fmul = 15, MMIX-PIPE §49. 
freni.niax : int, MMIX-PIPE §349. 
fsqrt = 17, MMIX-PIPE §49. 
gran = 3, §13. 
hash.prime : int , 

MMIX-PIPE §207. 
holding.time : int , 

MMIX-PIPE §247. 

INT_MAX = macro, <liniits.h>. 
Iring.size-. int, MMIX-PIPE §86. 
max.mem.slots : int, 

MMIX-PIPE §86. 
max.rename.regs : int , 
MMIX-PIPE §86. 
mem.addr.time: int, 
MMIX-PIPE §214. 
mem.chunks.max-. int, 
MMIX-PIPE §207. 
mem.read.time : int , 

MMIX-PIPE §214. 
mem.write.time : int , 
MMIX-PIPE §214. 



mor = 13, MMIX-PIPE §49. 
mulO =0, MMIX-PIPE §49. 
mull =1, MMIX-PIPE §49. 
mul2 =2, MMIX-PIPE §49. 
mul3 =3, MMIX-PIPE §49. 
mulf =4, MMIX-PIPE §49. 
mul5 =5, MMIX-PIPE §49. 
muW =6, MMIX-PIPE §49. 
mull = 7, MMIX-PIPE §49. 
mul8 =8, MMIX-PIPE §49. 
mux = 11, MMIX-PIPE §49. 
op_spec = struct , §14. 
peekahead: int, MMIX-PIPE §59. 
prts = 10, §13. 
pv_spec = struct , §12. 
sadd = 12, MMIX-PIPE §49. 
security. disabled: bool, 
MMIX-PIPE §66. 
setsz = 2, §13. 
sh = 10, MMIX-PIPE §49. 
true = l, MMIX-PIPE§11. 
vctsz = 4, §13. 
wra = 6, §13. 
wrb = 5, §13. 
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16. The new-cache routine creates a cache structure with default values. (These 
default values are “hard- wired” into the program, not actually read from the CPV 
table.) 

( Subroutines 10 } -|-= 

static cache *new.cache ARCS ((char *)); 
static cache *new_cache{name) 
char *name\ 

{ register cache *c= (cache *) calloc(l, sizeof (cache)); 



if (-ic) panic(errprmW ("Can’tuallocateu’/os" , name)); 



c-aa = 1; 
cr'66 = 8; 
c-cc = 1; 

cr^gg = 8 ; 

c*vv = 0 ; 
cr*repl = random-, 
c-vrepl = random-, 



/* default associativity, should equal CPV [0].defval */ 
/* default blocksize */ 

/* default setsize */ 

/* default granularity */ 

/* default victimsize */ 

/* default replacement policy */ 

/* default victim replacement policy */ 



cmode = 0; /* default mode is write-through and write-around */ 

c-access-time = c-*copy.in.time = c-copy^ouCtime — 1; 

cr*filler.ctl = &i{c-‘filler.ctl)-, 

cr*filler_ctl .ptr.a = (void *) c; 

cr*filler^ctl .go . 0.1 = 4; 

c-'flusher .ctl = &i{c-’flusher_ctl)-, 

cr>flusher.ctl .ptr.a = (void *) c; 

c-flusher.ctl .go . 0.1 = 4; 

exports = 1; 

(rename = name-, 
return c; 



17. (Initialize to defaults 17) = 

PV-size = (sizeof PF)/sizeof (pv_spec); 

CPV-size = (sizeof CP F) /sizeof (cpv_spec); 

OP-size = (sizeof OP)/sizeof (op_spec); 

ITcache = neui.cache ("ITcache" ); 

DTcache = neui_cac/ie ("DTcache" ); 

Icache = Dcache = Scache = A; 

for {j = 0; j < PV-size-, /-H-) *(PV[j].v) = PV \j].defval-, 
for (/ = 0; j < OP-size-, /++) { 
pipe.seg[OP[j].v\^] — OP[j\.defval-, 
pjpe_seg[OP[/].ii][l] = 0; /* one stage */ 

} 

This code is used in section 38. 
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18. Reading the specs. Before we’re ready to process the configuration file, we 
need to count the number of functional units, so that we know how much space to 
allocate for them. 

A special background unit is always provided, just to make sure that TRAP and 
TRIP instructions are handled by somebody. 

( Count and allocate the functional units 18 ) = 
fumt.count = 0; 

while {strcmp {token , "end") 7 ^ 0) { 
get.token ( ); 

if (sircmp (iofcen, "unit ") = 0 ) { 
funiRcount++-, 

geRtoken{)\ geRtoken{)\ /* a unit might be named unit or end */ 

} 

} 

funit = (func *) calloc{funit.count + 1, sizeof (func)); 

if {^funit) pamc(errprmi^? ("Can^tuallocateutheufunctionaluunits" )); 
strcpy {funit [funit.count] . name , " ) ; 

fumt\fun%t_count].ops^] ^ ^B0000000\ /* TRAP */ 

fumt[funit_count].ops\7] — * /* TRIP */ 

This code is used in section 38. 



aa: int, mmix-pipe §167. 
access-time : int , 

MMIX-PIPE §167. 

ARCS = macro (), mmix-pipe §6. 
hh: int, mmix-pipe §167. 
cache = struct, 

MMIX-PIPE §167. 
calloc: void *(), <stdlib.h>. 
cc: int, MMIX-PIPE §167. 
copy.in.time : int , 

MMIX-PIPE §167. 
copy. out int, 

MMIX-PIPE §167. 

CPV \ cpv_spec [], §15. 
CPV-size: int, §15. 
cpv.spec = struct, §13. 
ctl: control *, mmix-pipe §23. 
Dcache: cache *, 

MMIX-PIPE §168. 
defval: int, §14. 
defval: int, §12. 

DTcache: cache *, 

MMIX-PIPE §168. 
errprintO = macro (), §8. 
errprintl = macro (), §8. 



filler: coroutine, 

MMIX-PIPE §167. 
filler.ctl: control, 

MMIX-PIPE §167. 
flusher: coroutine, 

MMIX-PIPE §167. 
flusher.ctl: control, 
MMIX-PIPE §167. 
func: struct, mmix-pipe §76. 
funit: func *, MMIX-PIPE §77. 
funit.count : int , 

MMIX-PIPE §77. 

get.token: static void {), §10. 
gg: int, mmix-pipe §167. 
go = 72, MMIX-PIPE §49. 

Icache: cache *, 

MMIX-PIPE §168. 

ITcache: cache *, 

MMIX-PIPE §168. 
j: register int, §38. 

1: tetra, mmix-pipe §17. 
mode: int, mmix-pipe §167. 
name: char *, mmix-pipe §167. 
name: char [], mmix-pipe §76. 
o: octa, MMIX-PIPE §40. 



OP: op.spec [], §15. 

OP-size: int, §15. 
op.spec = struct , §14. 
ops: tetra [], mmix-pipe §76. 
panic =macro (), §8. 
pipe.seq: unsigned char [][], 
MMIX-PIPE §136. 
ports: int, mmix-pipe §167. 
ptr.a: void *, mmix-pipe §44. 
PV: pv.spec [], §15. 

PV-size: int, §15. 
pv.spec = struct , §12. 
random = 0, mmix-pipe §164. 
repl: replace.policy , 
MMIX-PIPE §167. 

Scache: cache *, 

MMIX-PIPE §168. 
strcmp: int (), <string.h>. 
strcpy: char *(), <string.h>. 
token: char [], §9. 
v: inter nal.opcode, §14. 
v: int *, §12. 
vrepl: replace.policy , 
MMIX-PIPE §167. 
vv: int, MMIX-PIPE §167. 
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19. Now we can read the specifications and obey them. This program doesn’t bother 
to be very tolerant of errors, nor does it try to be very efficient. 

Incidentally, the specifications don’t have to be broken into individual lines in any 
meaningful way. We simply read them token by token. 

( Record all the specs 19 } = 
rewind ( config-file ) ; 
funit-count — 0; 
token [0] = ’ \0 ’ ; 

while {strcmp {token , "end") 7 ^ 0) { 
get.token ( ); 

if {strcmp {token end" ) = 0) break; 

( If token is a parameter name, process a PV spec 20 ) ; 

( If token is a cache name, process a cache spec 21 ); 

( If token is an operation name, process a pipe spec 24 ) ; 
if {strcmp {token, "unit") = 0) (Process a functional spec 25 ); 

panic { errprintl ( "Conf igurat ionusyntaxusrror : uSpecif icat i onucan ’ tuStartuwithuX 
‘"/.s’ ", token))-, 

} 

This code is used in section 38. 

20. (If token is a parameter name, process a PV spec 20 ) = 
for (i = 0; j < PV-size; i++) 

if {strcmp {token, PV\j], name) = 0) { 
n — get Ant { ); 
if {n < PV [i].minval) 

pamc(errprmt^ ("Conf igurationuerror : u°/.suniustubeu>=u°/.ti" , PV[j\.name, 

PV [j].minval))-, 
if (n > PV \j].maxval) 

pomc(errprmtS ("Conf igurationuerror : u°/.Suniustubeu<=u°/.d" , PV[j\.name, 

PV [j].maxval))-, 

if {PV \j].power.ofAwo A (n & (n — 1))) 

panic {errprintl ("Conf igurationuerror : u°/.SuHiustubeuaupoweruof u2" , 
PV[j].name))-, 

*{PV\j].v) = n; 

break; 

} 

if {j < PV-size) continue; 

This code is used in section 19. 



123 



MMIX-CONFIG: READING THE SPECS 



21. (If token is a cache name, process a cache spec 21 ) = 
if (sfrcmp(tofcen, "ITcache" ) = 0) { 

pcs{ITcache)\ continue; 

} else if (sfrcmp (fofeen, "DTcache" ) = 0) { 
pcs(DTcache); continue; 

} else if {strcmp {token, " Icache" ) = 0) { 
if {-ilcache) Icache = new. cache {"Icache")-, 
pcs{Icaehe); continue; 

} else if {strcmp {token , "Dcache") = 0) { 

if {-^Dcache) Dcaehe = new.eache{"T)cache"); 
pcs {Dcache)-, continue; 

} else if {strcmp {token, " Scache" ) = Q) { 
if {-ilcache) leache = new.cache{"lcache")-, 
if {-^Dcache) Dcaehe = new.eache{"T)cache"); 
if {-iScache) S cache = new.cache{"Scache") -, 
pcs{Scache); continue; 

} 

This code is used in section 19. 

22. (Subroutines 10 ) += 

static void ppol ARGS((replace_policy *)); 

static void ppol{rr) /* subroutine to scan for a replacement policy */ 

replace_policy *rr; 

{ 

get.token ( ); 

if {strcmp {token , "random") = 0) *rr = random- 
else if {strcmp {token , "serial") = 0) *rr = serial-, 
else if {strcmp {token , "pseudolru") = 0) *rr = pseudojru; 
else if {strcmp {token , "Iru") = 0) *rr — Iru; 

else token.pres canned — true-, /* oops, we should rescan that token */ 

} 



ARCS = macro (), mmix-pipe§6. 

config.file: FILE §9. 

Dcache: cache *, 

MMIX-PIPE §168. 

DTcache: cache 
MMIX-PIPE §168. 
errprintl = macro (), §8. 
errprint2 = macro (), §8. 
funit^count: int, 

MMIX-PIPE §77. 
getSnt: static int {), §11. 
get-token: static void (), §10. 
Icache: cache *, 

MMIX-PIPE §168. 



ITcache: cache *, 

MMIX-PIPE §168. 
j: register int, §38. 

Iru = 3, MMIX-PIPE §164. 
maxval: int, §12. 
minval: int, §12. 
n: register int, §38. 
name: char [], §12. 
new-cache: static cache *(), 
§16. 

panic = macro ( ), §8. 
pcs: static void (), §23. 
power-of-two: bool, §12. 
pseudo-lru = 2, mmix-pipe§164. 



PV: pv.spec [], §15. 
PV-size: int, §15. 
random={)^ mmix-pipe §164. 
replace.policy = enum, 
MMIX-PIPE §164. 
rewind: void (), <stdio.h>. 
Scache: cache *, 

MMIX-PIPE §168. 
serial = 1, mmix-pipe §164. 
strcmp: int (), <string.h>. 
token: char [], §9. 
token.prescanned: bool, §9. 
true = 1, MMIX-PIPE§11. 
v: int *, §12. 
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23. ( Subroutines lo ) += 
static void pcs ARGS((cache *)); 

static void pcs{c) /* subroutine to process a cache spec */ 
cache *c; 

{ 

register int j, n\ 

get.token ( ); 

for (j = 0; j < CPV-size\ j++) 

if {strcmp {token , CPV [j].name) = 0) break; 
if {j = CPV-size) panic(errprmW ("Configuratiouusyntaxuerror :u‘°/os’uisn’tu\ 
aucacheuparameteruname" , token))', 
n = geEint ( ) ; 
if (n < CPV \j].minval) 

panic ( errpnntg ("Conf igur at ionusrror : u7«Sumustubeu>=u’/«d" , CPV [j]. name, 

CP V [jj.minval))', 
if (n > CPV \j].maxval) 

panic ( errprmtg ("Conf igur at ionusrror : u"/oSuHiustubeu<=u’/«d" , CPV [j]. name, 

CP V \j ] .maxval ) ) ; 

if {CPV \j].power.ofj,wo A (n & (n — 1))) 

panic {errprintl ("Conf iguratiouuerror : u’/oSumustubeupoweruof u2" , CPV[j].name))', 
switch {CPV[j].v) { 

case assoc, c-aa = w, ppol{&i{c‘repl)); break; 

case blksz: ccbb = n; break; 

case setsz: cr^cc = n; break; 

case gran: ccgg = n; break; 

case vctsz: ccvv = n; ppol{&i{ccvrepl))', break; 

case wrb: c*mode = {ccmode & ~WRITE_BACK) + n* WRITE_BACK; break; 
case wra: cr*mode = (c-*morfe & ~WRITE_ALLQC) + n * WRITE_ALL0C; break; 
case acctm: if (n > max.cycs) max^cycs = n; 

ccaccess-time = n; break; 
case citm-. if (n > max-cycs) max.cycs = n; 

c-copy-in.time = n; break; 
case cotm: if (n > max-cycs) max.cycs = n; 

cccopy-OutMme = n; break; 
case prts: cports = n; break; 

} 

} 

24. ( If token is an operation name, process a pipe spec 24 ) = 
for (j = 0; j < OP size-, j++) 

if {strcmp {token, OP[j].name) = 0) { 
for (i = 0; ; i++) { 
n = getjint ( ) ; 
if (n < 0) break; 

if (n = 0) pamc(errpnnt0 ("Conf igurationuerror : uPipelineucyclesuinuX 
stubeupositive" )); 
if (n > 255) 

panic {errprintO ("Conf igurationuerror : uPipelineucyclesumustubeu<=u255" )) 
if (n > max^cycs) max^cycs = n; 
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if {i > pipejimit) 

panic {errprintl ("Conf igurationuerror :uMoreuthanu°/odupipelineuStages" , 
pipeJimit)); 

pipe.seq[OP[j].v][i] = n; 

} 

token.prescanned — true; 

break; 

} 

if U < OP size) continue; 

This code is used in section 19. 



aa: int, mmix-pipe §167. 
access-time : int , 

MMIX-PIPE §167. 
acctm = 7, §13. 

ARCS = macro ( ), mmix-pipe §6. 

assoc = 0, §13. 

bb: int, mmix-pipe §167. 

blksz = 1, §13. 

cache = struct, 

MMIX-PIPE §167. 
cc: int, mmix-pipe §167. 
citm = 8, §13. 
copy.in.time : int , 

MMIX-PIPE §167. 
copy. out int, 

MMIX-PIPE §167. 
cotm = 9, §13. 

CPV: cpv_spec [], §15. 
CPV-size: int, §15. 
errprintO = macro (), §8. 
errprintl = macro (), §8. 
errprint2 = macro (), §8. 



get.int: static int (), §11. 
get.token: static void (), §10. 
gg\ int, mmix-pipe §167. 
gran = 3, §13. 
i: register int, §38. 
j: register int, §38. 
max.cycs: int, §15. 
maxval: int, §13. 
minval: int, §13. 
mode: int, mmix-pipe §167. 
n: register int, §38. 
name: char [], §13. 
name: char [], §14. 

OP: op.spec [], §15. 

OP-size: int, §15. 
panic = macro ( ), §8. 
pipe.limit =90, mmix-pipe §136. 
pipeseq: unsigned char [][], 
MMIX-PIPE §136. 
ports: int, mmix-pipe §167. 
power.of.two: bool, §13. 



ppol: static void (), §22. 
prts = 10, §13. 
repl: replace.policy , 
MMIX-PIPE §167. 
setsz = 2, §13. 

strcmp: int (), <string.h>. 
token: char [], §9. 
token.prescanned: bool, §9. 
true = l, MMIX-PIPE§11. 
v: c.param, §13. 
v: inter nal.opcode, §14. 
vctsz = 4, §13. 
vrepl: replace.policy , 
MMIX-PIPE §167. 
vv: int, MMIX-PIPE §167. 
wra = 6, §13. 
wrb = 5, §13. 

WRITE. ALLOC = 2, 

MMIX-PIPE §166. 
WRITE.BACK = 1, 

MMIX-PIPE §166. 
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25. (Process a functional spec 25 ) = 

{ 

get.token ( ); 
if {strlen (token) >15) 

panic (errprintl ("Conf igurationusrror :u‘°/os’uisunioreuthanul5ucharactersulong" , 
token))-, 

strcpy (funit \funiEcount], name , token ) ; 
get.token ( ); 
if (strlen (token) 7 ^ 64) 

panic (errprintl ("Conf igurationuerror : unnitu°/oSudoesn’tulia\ 
veu64uhexudigit uspec s " , funit [funiEcount], name ) ) ; 
for (i= j = n = 0; j < 64; j++) { 

if (token[j] > ’0’ A token[j] < ’9’ ) n = (n <C 4) + (token[j] — ’0’ ); 

else if (token[j] > ’a’ A token[j] < ’f ’) n = (n <C 4) + (token[j] — ’a’ + 10); 

else if (token[j] > ’A’ A token[j] < ’F’) n = (n <C 4) + (token[j] — ’A’ + 10); 

else 

panic (errprintl ("Conf igurationuerror :u‘’/.c’uisunotuauhexudigit" , token[j\))-, 
if ((j & *7) = * 1 ) funit[funiEcount].ops[i++] = n, n = 0 ; 

} 

funit.count -H--, 

continue; 

} 

This code is used in section 19. 
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26. Checking and allocating. The battle is only half over when we’ve absorbed 
all the data of the configuration file. We still must check for interactions between 
different quantities, and we must allocate space for cache blocks, coroutines, etc. 

One of the most difficult tasks facing us is to determine the maximum number of 
pipeline stages needed by each functional unit. Let’s tackle that first. 

( Allocate coroutines in each functional unit 26 } = 

(Build table of pipeline stages needed for each opcode 27); 
for (j = 0; j < funiCcount’, j++) { 

( Determine the number of stages, n, needed by funit [j] 29 ); 
funit[j].k = n; 

funit[j].co = (coroutine *) calloc(n, sizeof (coroutine)); 
for (i = 0; i < n\ i++) { 

fumt\j].co[i].name = funit[j].name\ 
funit\j].co[i].stage = i + 1; 

} 

} 

This code is used in section 38. 

27. (Build table of pipeline stages needed for each opcode 27 ) = 

for (j = div; j < max.pipe.op-, j++) int.stages[j] = strlen{pipe.seq\j]); 
for ( ; J < max.real.command-, j++) int.stages[j] = 1; 
for {j = mul0,n = 0; j < mul8; j++) 

if {strlen{pipe.seq[j]) > n) n = strlen{pipe.seq\j]); 
int.stages[mul] = n; 

mt.stages[ld] = int.stages[st] = int.stages[frem] = 2; 
for (j = 0; j < 256; j++) stages\j] = int.stages[int.op[j]]; 

This code is used in section 26. 



calloc: void *(), <stdlib.h>. 
co: coroutine *, 

MMIX-PIPE §76. 
coroutine = struct , 
MMIX-PIPE §23. 
div =9, MMIX-PIPE §49. 
errprintl = macro (), §8. 
frem = 25, mmix-pipe §49. 
funit: func *, mmix-pipe §77. 
funit^count: int, 

MMIX-PIPE §77. 

get^token: static void (), §10. 
i: register int, §38. 



int^op: internaLopcode [], 

§28. 

int.stages: int [], §28. 
j: register int, §38. 
k: int, mmix-pipe §76. 

Id = 56, MMIX-PIPE §49. 
max.pipe.op =feps, 
MMIX-PIPE §49. 
max.reaLcommand = trip, 
MMIX-PIPE §49. 
mul =26, MMIX-PIPE §49. 
mulO =0, MMIX-PIPE §49. 
mul8 =8, MMIX-PIPE §49. 



n: register int, §38. 
name: char [], mmix-pipe §76. 
ops: tetra [], mmix-pipe §76. 
panic = macro (), §8. 
pipe.seq: unsigned char [][], 
MMIX-PIPE §136. 
st = 63, MMIX-PIPE §49. 
stage: int, mmix-pipe §23. 
stages: int [], §28. 
strcpy: char *(), <string.h>. 
strlen: size.t (), <string.h>. 
token: char [], §9. 
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28. The int_op conversion table is similar to the internaLop array of the MMIX.run 
routine, but it replaces divu by div , fsub by fadd, etc. 

( Global variables 9 ) += 

internaLopcode mt_op [256] = { 

trap , fcmp , funeq , funeq , fadd , fix , fadd , fix , 

flot , flot , flot , flat , flot , flot , flat , flot , 

fmul , feps , feps , feps , fdiv , fsqrt , frem , fint , 

mul, mul, mul, mul, div, div, div, div, 

add, add, addu, addu, sub, sub, subu, subu, 

addu, addu, addu, addu, addu, addu, addu, addu, 

cmp , cmp , cmpu , cmpu , sub , sub , subu , subu , 

sh, sh, sh, sh, sh, sh, sh, sh, 

br, br, br, br, br, br, br, hr, 

hr ,br ,br ,br ,br ,br ,br , hr , 

pbr , pbr , pbr , pbr , pbr , pbr , pbr , pbr , 

pbr , pbr , pbr , pbr , pbr , pbr , pbr , pbr , 

cset, cset, cset, cset, cset, cset, cset, cset, 

cset, cset, cset, cset, cset, cset, cset, cset, 

zset , zset , zset ,zset, zset , zset , zset, zset , 

zset , zset , zset , zset, zset , zset , zset, zset , 

Id, Id, Id, Id, Id, Id, Id , Id, 

Id, Id, Id, Id, Id, Id, Id , Id, 

Id, Id, Id, Id, Id, Id, Id , Id, 

Id, Id, Id, Id, prego , prego , go, go, 

st , st , st , st , st , st , st , st , 

st , st , st , st , st , st , st , st , 

st , st , st , st , st , st , st , st , 

st , st , st , st , st , st , pushgo , pushgo , 

or, or, orn, orn, nor, nor, xor, xor, 

and, and, andn, andn, nand, nand, nxor, nxor, 

bdif , bdif , wdif , wdif , tdif , tdif , odif , odif , 

mux, mux, sadd, sadd, mor, mor, mor, mor, 

set, set, set, set, addu, addu, addu, addu, 

or,or,or,or, andn, andn, andn, andn, 

noop , noop , pushj , pushj , set ,set, put , put , 

pop , resume , save , unsave , sync , noop , get , trip } ; 

int inKstages[max-reaLcommand + 1]-, /* stages as function of internaLopcode */ 

int stages [256]; /* stages as function of mmix_opcode */ 

29. (Determine the number of stages, n, needed by funit[j] 29 ) = 
for (i = n = 0; i < 256; i++) 

if {{{funit[j].ops[i ^ 5] <C (i & *lf)) & *80000000) A stages[i] > n) n = stages[i\, 
if (n = 0) panic [er r print 1 ("Configurationuerror :uunitu7«Sudoesn’tudouanything" , 
funit\j].name))-. 

This code is used in section 26. 



129 



MMIX-CONFIG: CHECKING AND ALLOCATING 



30. The next hardest thing on our agenda is to set up the cache structure fields that 
depend on the parameters. For example, although we have defined the parameter in 
the bb field (the block size), we also need to compute the b field (log of the block 
size), and we must create the cache blocks themselves. 

( Subroutines 10 ) += 

static int Ig ARGS((int)); 

static int lg{n) /* compute binary logarithm */ 

int n; 

{ register int j, 1; 

for {j = n, ; = 0; j-, j »= 1) /++; 

retnrn I — 1; 

} 



add =29, MMIX-PIPE §49. 
addu = 30, MMIX-PIPB §49. 
and = 37, MMix-PiPE §49. 
andn = 38, MMIX-PIPE §49. 

ARCS = macro (), MMIX-PIPE §6. 
b: int, MMIX-PIPE §167. 
bb: int, MMIX-PIPE §167. 
bdif = 48, MMIX-PIPE §49. 
br = 69, MMIX-PIPE §49. 
cmp= 46, MMIX-PIPE §49. 
cmpu=A7, MMIX-PIPE §49. 
cset = 53, MMIX-PIPE §49. 
div =9, MMIX-PIPE §49. 
divu = 28, MMIX-PIPE §49. 
errprintl = macro (), §8. 
fadd = 14, MMIX-PIPE §49. 
fcmp=22, MMIX-PIPE §49. 
fdiv = 16, MMIX-PIPE §49. 
/eps=21, mmix-pipe§49. 
fint = 18, MMIX-PIPE §49. 
fix = 19, MMIX-PIPE §49. 
flat = 20, MMIX-PIPE §49. 
fmul = 15, MMIX-PIPE §49. 
frem = 25, MMIX-PIPE §49. 
fsqrt = 17, MMIX-PIPE §49. 
fsub =24, MMIX-PIPE §49. 
funeq = 23, MMIX-PIPE §49. 



funit: func *, MMIX-PIPE §77. 
get=hA, MMIX-PIPE §49. 
go = 72, MMIX-PIPE §49. 
i: register int, §38. 
internaLop: internal_opcode 
[], MMIX-PIPE §51. 
internal.opcode = enum, 
MMIX-PIPE §49. 
j: register int, §38. 

Id = 56, MMIX-PIPE §49. 
max.reaLcommand = trip, 
MMIX-PIPE §49. 
mmix_opcode =enum, 
MMIX-PIPE §47. 

MMIX.run : void ( ) , 
MMIX-PIPE §10. 
mor = 13, MMIX-PIPE §49. 
mul =26, MMIX-PIPE §49. 
mux = 11, MMIX-PIPE §49. 
n: register int, §38. 
name: char [], MMIX-PIPE §76. 
nand =39, MMIX-PIPE §49. 
noop =81, MMIX-PIPE §49. 
nor = 36, MMIX-PIPE §49. 
nxor=Al, MMIX-PIPE §49. 
odif =51, MMIX-PIPE §49. 
ops: tetra [], MMIX-PIPE §76. 



or = 34, MMIX-PIPE §49. 
orn =35, MMIX-PIPE §49. 
panic = macro (), §8. 
pbr = 70, MMIX-PIPE §49. 
pop =75, MMIX-PIPE §49. 
prego = 73, MMIX-PIPE §49. 
pushgo = 7A, MMIX-PIPE §49. 
pushj =71, MMIX-PIPE §49. 
put =55, MMIX-PIPE §49. 
resume =76, MMIX-PIPE §49. 
sadd = 12, MMIX-PIPE §49. 
save = 77, mmix-pipe§49. 
set: cacheset *, 

MMIX-PIPE §167. 
sh = 10, MMIX-PIPE §49. 
st = 63, MMIX-PIPE §49. 
sii6=31, MMIX-PIPE §49. 
subu = 32, MMIX-PIPE §49. 
sync = 79, MMIX-PIPE §49. 
tdif = 50, MMIX-PIPE §49. 
trap = 82, MMIX-PIPE §49. 
trip = S3, MMIX-PIPE §49. 
unsave =78, MMIX-PIPE §49. 
wdif=A9, MMIX-PIPE §49. 
xor =40, MMIX-PIPE §49. 
zset = 52, MMIX-PIPE §49. 
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31. (Subroutines 10 ) += 

static void alloc-cache ARGS((cache *,char *)); 
static void alloc-cache{c, name) 
cache *c; 
char *name\ 

{ register int j, k\ 

if [c^bb < cr>gg) panic(errprmU ("Conf iguratiouuerror : ublocksizeuof u’/.suis\ 
ulessuthanugranularity" , name))’, 
if (name[l] = ’T’ A ebb 7 ^ 8 ) 

panic (errprintl ("Conf igurationusrror : ublocksizeuof u°/oSuniustubeu 8 " , name)); 
ea = Ig(eaa); 
eb — lg{ebb); 
ec = Ig {ece ) ; 

(^9 = lg{c->gg); 
c~*v = lg{evv); 

etagmask = — (1 <C {eb + ec)); 
if (ea + eb + c-c > 32) 

panic {errprintl ("Conf igurationuerror : u’/suhasu>=u4ugigabytesuof udata" , 
name)); 

if (egg 7 ^ 8 A -1 (<r*morfe & WRITE_ ALLOC)) panic(errprint£ ("Conf igura.tionuerror\ 
:u"/«Sudoesuwrite-arounduwithugranularityu°/od" , name, egg)); 

{ Allocate the cache sets for cache c 32 ) ; 
if (evv) (Allocate the victim cache for cache c 33); 
einbuf .dirty = (char *) calloc(ebb sizeof (char)); 

if (-'einbuf .dirty) 

panic (errprintl ("Can’tuallocateudirtyubitSuforuinbuff eruofu°/os" , name)); 
einbuf .data = (octa *) ealloc(ebb ^ 3, sizeof (octa)); 
if (-•einbuf .data) 

panic (errprintl ("Can’tuallocateudatauforuinbuff eruofu°/.s" , name)); 
eoutbuf .dirty = (char *) calloc(ebb sizeof (char)); 

if (-•eoutbuf .dirty) 

panic (errprintl ("Can’tuallocateudirtyubitsuforuoutbufferuofu’/.s" , name)); 
eoutbuf .data = (octa *) calloc(ebb 3, sizeof (octa)); 

if (-•eoutbuf .data) 

panic (errprintl ("Can’ tuallocateudatauf oruout buff eruofu"/«s " , name)); 
if (name[0] 7^ ’S’ ) (Allocate reader coroutines for cache c 34); 

} 

32. T^tdefine sigmbit *80000000 

( Allocate the cache sets for cache c 32 ) = 

eset = (cacheset *) ca//oc(c-'cc, sizeof (cacheset)); 

if (-•eset) panic (errprintl ("Can’tuallocateucacheusetSuforu°/.s" , name)); 
for (j = 0 ; j < ecc; j++) { 

eset[j] = (cacheblock *) ca//oc(c-*aa, sizeof (cacheblock)); 
if (-•eset[j\) 

panic (errprmt^ ("Can’ tuallocateucacheublocksuforusetu°/.duofu°/os" , j, name)); 
for (fc = 0; k < eaa; k++) { 

eset[j][k].tag .h = sign.bit; /* invalid tag */ 
eset[j][k]. dirty — (char *) ealloc(ebb 3 > <^< 7 , sizeof (char)); 



131 



MMIX-CONFIG: CHECKING AND ALLOCATING 



if {-ncr’set[j][k]. dirty) 

panic (errprmi5 ("Can’ tuallocateudirtyubitsuforublocku°/oduofusetuy.duofu°/os" 
k,j, name))-, 

cr*set[j][k].data = (octa *) calloc{c~‘bb ^ 3, sizeof (octa)); 
if (~icr*set[j\[k].data) 

panic (errpnni^ ("Can’ tuallocateudatauforublockuy.duofusetu°/oduofu°/.s" ,k,j, 
name)); 

} 

} 

This code is used in section 31. 

33 . (Allocate the victim cache for cache c 33) = 

{ 

cr’victim = (cacheblock *) caZtoc(cr'TO, sizeof (cacheblock)); 
if {-icr*victim) 

panic [errprintl ("Can’tuallocateublocksuf oruvictimucacheuof u’/.s" , name)); 
for (k = 0; k < cr‘vv; k++) { 

c-‘victim\k].tag .h = sign.bit; /* invalid tag */ 
c-*victim[k]. dirty = (char *) calloc{c-bb c-p, sizeof (char)); 
if (-ic-*victim[k]. dirty) 

panic ( errprint2 ( " Can ’ t uallocat eudirt jubit Suf orublocku’/odu \ 
of uvictimucacheuof u’/.s" , k, name)); 
c-‘victim\k].data = (octa *) calloc{cr*bb 3> 3, sizeof (octa)); 
if ij~ic-victim[k].data) 

panic {errprint2 ("Can’ t uallocat eudatauforublockuy.duofuvictimucacheuofuy.s" 
k, name)); 

} 

} 

This code is used in section 31. 



a: int, mmix-pipe §167. 
aa: int, mmix-pipe §167. 

ARCS = macro (), mmix-pipe §6. 
b: int, mmix-pipe §167. 
bb: int, mmix-pipe § 167. 
cache = struct, 

MMIX-PIPE §167. 
calloc: void *(), <stdlib.h>. 
cc: int, MMIX-PIPE §167. 
data: octa *, mmix-pipe §167. 
dirty: char *, mmix-pipe §167. 
errprintl = macro (), §8. 



errprint2 = macro (), §8. 

= macro (), §8. 
g: int, mmix-pipe§167. 
gg: int, mmix-pipe §167. 
h: tetra, mmix-pipe §17. 
inbuf: cacheblock, 
MMIX-PIPE §167. 

Ig: static int (), §30. 
mode: int, mmix-pipe § 167. 
octa = struct, mmix-pipe §17. 
outbuf: cacheblock, 
MMIX-PIPE §167. 



panic = macro (), §8. 
set: cacheset *, 

MMIX-PIPE §167. 
tag: octa, mmix-pipe §167. 
tagmask: int, mmix-pipe §167. 
v: int, MMIX-PIPE §167. 
victim: cacheset, 

MMIX-PIPE §167. 
vv: int, MMIX-PIPE §167. 
WRITE. ALLOC = 2, 

MMIX-PIPE §166. 
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34. ( Allocate reader coroutines for cache c 34 ) = 

{ 

c-reader = (coroutine *) calloc(e-'ports, sizeof (coroutine)); 
if {-'C-reader) panic {errprintl ("Can’tuallocateureadersuf oru’/.s" , name)); 
for (j = 0; j < (DpoHs-, j++) { 
cr>reader[j]. stage = vanish-, 

c-^reader[j].name = (name[0] = ’D’ ? (name[l] = ’T’ ? "DTreader" : "Dreader") : 
(name[l] = ’T’ ? "ITreader" : "Ireader")); 

} 

} 

This code is used in section 31. 

35. ( Allocate the caches 35 ) = 
alloc.cache [ITcache, "ITcache" ); 

ITcache- filler .name = "ITfiller"; ITcache-filler .stage = filLfromjvirt-, 
alloc.cache [DTcache, "DTcache" ); 

DTcache^ filler .name — "DTfiller"; DTcache^ filler .stage = filLfrom.virt-, 
if {I cache) { 

alloe^cache{Icache, "Icache"); 

leache^filler .name = "Ifiller"; leache-filler .stage = filLfromjmem-, 

} 

if {Dcache) { 

alloe.cache {D cache , "Dcache" ); 

Dcache^ filler .name — "Dfiller"; Dcache-' filler .stage = filLfrom.mem; 
Deache^flusher .name = "Dflusher"; Dcaehe^flusher .stage = fiush.to.mem-, 

} 

if (Scaehe) { 

alloe.cache{Scaehe, "Scache"); 

if {Scache-bb < Icache-bb) pamc(errpnntO ("Conf igurationuerrorX 
:uScacheublocksusmalleruthanuIcacheublocks" )); 
if {Scache-'bb < Dcache^bb) panic(errpnntO ("Conf igurationuerrorX 
:uScacheublocksusmalleruthanuDcacheublocks" )); 
if [Scache-'gg 7^: Dcaehe^gg) pamc(errpnntO ("Conf igurationuerrorX 
: uScacheugranularityudif f ersuf romutheuDc ache " )) ; 

Icache-filler .stage = fill.from.S -, 

Dcache-filler .stage = fill.from.S -, Dcache-fiusher .stage = flush.to.S-, 

Scache-'filler .name = "Sfiller"; Scache-filler .stage = fill.from.mem-, 

S cache-" flusher .name = "Sflusher"; S cache-" flusher .stage = flush.to.mem-, 

} 

This code is used in section 38. 

36. Now we are nearly done. The only nontrivial task remaining is to allocate the 
ring of queues for coroutine scheduling; for this we need to determine the maximum 
waiting time that will occur between scheduler and schedulee. 

( Allocate the scheduling queue 36 ) = 
bus.words = mem.bus.bytes 3; 

j = {mem.read.time < mem.write.time ? mem.write.time : mem.read.time)-, 
n = T, 

if {Scaehe A Seache-"bb > n) n = Scache-"bb-, 
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if {[cache A Icache-'bb > n) n = Icache-bb-, 

if [Dcache A Dcache-bb > n) n — Dcache-^bb; 

n = mem.addrHime + ((int)(n + bus-Words — 1) / bus. words) * j; 

if (n > max.cycs) max.cycs = n; j* now max.cycs bounds the waiting time */ 
rvng.size = max.cycs + 1; 

ring = (coroutine *) ca//oc(rm 5 _size, sizeof (coroutine)); 

if {-'ring) pamc(errprmtO ("Can’tuallocateutheuschedulinguring" )); 

{ register coroutine *p; 

for (p = ring-, p < ring + ring.size-, p++) { 

pr-name = /* header nodes are nameless */ 

postage = max.stage; 

} 

} 

This code is used in section 38. 



alloc.cache: static void {), 
§31. 

bb: int, mmix-pipe §167. 
bus.words: int, 

MMIX-PIPE §214. 
c: cache *, §31. 
calloc: void *(), <stdlib.h>. 
coroutine = struct , 
MMIX-PIPE §23. 

Dcache: cache *, 

MMIX-PIPE §168. 

DTcache: cache *, 
MMIX-PIPE §168. 
errprintO = macro (), §8. 
errprintl = macro (), §8. 
filLfrom^mem = 95, 
MMIX-PIPE §129. 
filLfrom^S = 94, 

MMIX-PIPE §129. 
filLfrom^virt = 93, 



MMIX-PIPE §129. 
filler: coroutine, 
MMIX-PIPE §167. 
flush.to.mem = 97, 
MMIX-PIPE §129. 
flush^to^S = 96, 
MMIX-PIPE §129. 
flusher: coroutine, 
MMIX-PIPE §167. 
gg: int, mmix-pipe §167. 
Icache: cache *, 
MMIX-PIPE §168. 
ITcache: cache *, 
MMIX-PIPE §168. 
j: register int, §31. 
j: register int, §38. 
max.cycs: int, §15. 
max.stage = 99, 
MMIX-PIPE §129. 
mem.addr.time: int. 



MMIX-PIPE §214. 
memJ)us.bytes: int, §15. 
mem^read^time: int, 
MMIX-PIPE §214. 
mem^write.time : int , 
MMIX-PIPE §214. 
n: register int, §38. 
name: char *, §31. 
name: char *, mmix-pipe §23. 
panic = macro (), §8. 
ports: int, mmix-pipe §167. 
reader: coroutine *, 
MMIX-PIPE §167. 
ring: coroutine *, 

MMIX-PIPE §29. 
ring^size: int, mmix-pipe §29. 
Scache: cache *, 

MMIX-PIPE §168. 
stage: int, mmix-pipe §23. 
vanish = 9S, mmix-pipe §129. 
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37. (Touch up last-minute trivia 37) = 
if (hash.prime < mem.chunks-max) 

pan*c(errprmtO ("Conf igurationuerror : uhashprimeunrastuexceedumemchunksmax" )); 
mem.hash = (chunknode *) calloc{hash.prime + 1, sizeof (chunknode)); 
if {-imemJiash) pamc(errpnntO ("Can’tuallocateutheuhashutable" )); 
mem Jiash[Q\. chunk = (octa *) calloc{l <C 13, sizeof (octa)); 
if {-imemJiash ^]. chunk) panic (errprmtO ("Can’ tuallocateuchunkuO" )); 
memJiash[hashjprime\. chunk = (octa *) calloc(l ^ 13, sizeof (octa)); 
if {-imemJiash [hash.prime]. chunk) panic (errprintO ("Can’ tuallocateuOuchunk" )); 
mem.chunks = 1; 

fetch.bot = (fetch *) calloc{fetchJ>uf.size -|- 1, sizeof (fetch)); 
if {-ifetchJ)ot) panic(errpnntO ("Can’tuallocateutheufetchubuff er" )); 
fetch.top = fetch.bot + fetchj)uf^size ; 

reorder.bot = (control *) calloc {reorder. buf size + 1, sizeof (control)); 
if {-ireorder.bot) panic(errpriniO ("Can’tuallocateutheureorderubuffer" )); 
reorder.top = reorder.bot + reorder. buf size \ 

wbuf.bot = (write_node *) calloc{write.buf.size + 1, sizeof (write_node)); 
if {-^wbuf.bot) panic(errprintO ("Can’tuallocateutheuwriteubuff er" )); 
wbuf.top = wbuf.bot + write.buf.size\ 
if [bp.n = 0) bp.table = A; 

else { /* a branch prediction table is desired */ 

if {bp.a -I- bp.b + bp.c > 31) panic ( errprintO ("Conf igur at ionuerror\ 

: uBranchutableuhasu>=u2ugigabytesuof udata" )); 

bp.table = (char *) calloc{l <C {bp.a + bp.b + 6p_c), sizeof (char)); 
if {-^bp.table) panic(errpnntO ("Can’tuallocateutheubranchutable" )); 

} 

I = (specnode *) calloc {Iring.size, sizeof {specnode))-, 
if (-iZ) panic(errprintO ("Can’tuallocateulocaluregisters" )); 
j = bus.words', 

if {Icache A {Icache~>bb 3> 3) > j) j = Icache-bb 3> 3; 
fetched — (octa *) ca//oc(j, sizeof (octa)); 

if {-^fetched) panic(errpriniO ("Can’tuallocateupref etchubuffer" )); 
dispatch.stat = (int *) calloc {dispatch.max + 1, sizeof (int)); 
if {-! dispatch.stat) panic(errpriniO ("Can’tuallocateudispatchucounts" )); 
no.hardware.PT = 1 — hardware.PT ; 

This code is used in section 38. 
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38. Putting it all together. Here then is the desired configuration subroutine. 



^include <stdio.h> 
^include <stdlib.h> 
^include <ctype.h> 
^include <string.h> 
^include <limits.h> 
^include "mmix-pipe .h" 
( Type definitions 12 ) 

( Global variables 9 } 

( Subroutines 10 ) 



/* fopen, fgets, sscanf , rewind */ 
/* calloc, exit */ 

/ * isspace * / 

/* strcpy , strlen, strcmp */ 

/* INT_MAX */ 



void MMIX^config {filename ) 
char * filename', 

{ register int i, j, n; 

config-file = fopen {filename, "r"); 

if {-iconfig^file) panic {errprintl ("Can’tuopenuconf igurationuf ileu’/.s" ,/i/ename)); 
(Initialize to defaults 17 }; 

( Count and allocate the functional units 18 ) ; 

( Record all the specs 19 ); 

(Allocate coroutines in each functional unit 26); 

( Allocate the caches 35 ) ; 

( Allocate the scheduling queue 36 ) ; 

( Touch up last-minute trivia 37 ) ; 

} 



hh: int, mmix-pipe §167. 
bp.a: int, mmix-pipe §150. 
bp.b\ int, MMIX-PIPE §150. 
bp.c: int, mmix-pipe §150. 
bp.n: int, mmix-pipe §150. 
bp.table: char 
MMIX-PIPE §150. 
bus.words: int, 

MMIX-PIPE §214. 
calloc: void *(), <stdlib.h>. 
chunk: octa *, mmix-pipe §206. 
chunknode = struct , 
MMIX-PIPE §206. 
config^file: FILE +, §9. 
control = struct, 

MMIX-PIPE §44. 
dispatch^max: int, 

MMIX-PIPE §59. 
dispatch^stat: int *, 

MMIX-PIPE §66. 
errprintO = macro (), §8. 
errprintl = macro (), §8. 
exit: void (), <stdlib.h>. 
fetch = struct , mmix-pipe §68. 



fetchjyot: fetch *, 

MMIX-PIPE §69. 
fetch.buf.size: int, §15. 
fetchJ-op: fetch *, 

MMIX-PIPE §69. 
fetched: octa 
MMIX-PIPE §284. 
fgets: char *(), <stdio.h>. 
fopen: FILE +(), <stdio.h>. 
hardware^PT: int, §15. 
hash.prime : int , 

MMIX-PIPE §207. 

Icache: cache *, 

MMIX-PIPE §168. 

INT_MAX = macro, <liinits.h>. 
isspace: int (), <ctype.h>. 

1 : specnode *, mmix-pipe §86. 
Iring.size: int, mmix-pipe §86. 
mem.chunks: int, 

MMIX-PIPE §207. 
mem.chunks.max : int, 
MMIX-PIPE §207. 
memJiash: chunknode *, 
MMIX-PIPE §207. 



noJiardware.PT : bool, 
MMIX-PIPE §242. 
octa = struct, mmix-pipe §17. 
panic =macro (), §8. 
reorder.bot: control *, 
MMIX-PIPE §60. 
reorder. buf^size: int, §15. 
reorder.top: control *, 
MMIX-PIPE §60. 
rewind: void (), <stdio.h>. 
specnode = struct , 
MMIX-PIPE §40. 
sscanf: int (), <stdio.h>. 
strcmp: int (), <string.h>. 
strcpy: char *(), <string.h>. 
strlen: size.t (), <string.h>. 
wbuf.bot: write.node *, 
MMIX-PIPE §247. 
wbuf.top: write.node *, 
MMIX-PIPE §247. 
write.buf.size: int, §15. 
write.node = struct, 
MMIX-PIPE §246. 
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39. Names of the sections. 

( Allocate coroutines in each functional unit 26 ) Used in section 38 . 

( Allocate reader coroutines for cache c 34 ) Used in section 31 . 

( Allocate the cache sets for cache c 32 ) Used in section 31. 

( Allocate the caches 35 ) Used in section 38 . 

( Allocate the scheduling queue 36 ) Used in section 38 . 

(Allocate the victim cache for cache c 33) Used in section 31 . 

( Build table of pipeline stages needed for each opcode 27 ) Used in section 26 . 

( Count and allocate the functional units is ) Used in section 38 . 

(Determine the number of stages, n, needed by funit[j] 29) Used in section 26 . 
( Global variables 9, 15, 2 S ) Used in section 38 . 

( If token is a cache name, process a cache spec 21 ) Used in section 19 . 

( If token is a parameter name, process a PV spec 20 ) Used in section 19 . 

( If token is an operation name, process a pipe spec 24 ) Used in section 19. 

( Initialize to defaults 17) Used in section 38 . 

( Process a functional spec 25 ) Used in section 19 . 

( Record all the specs 19 ) Used in section 38 . 

( Subroutines 10, 11, I6. 22, 23, 30, 31 ) Used in section 38 . 

( Touch up last-minute trivia 37 ) Used in section 38 . 

( Type definitions 12, 13, 14 ) Used in section 38 . 
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MMIX-IO 

1. Introduction. This program module contains brute-force implementations of 
the ten input/output primitives defined at the beginning of MMIX-SIM. The subrou- 
tines are grouped here as a separate package, because they are intended to be loaded 
with the pipeline simulator as well as with the simple simulator. 

( Preprocessor macros 2 ) 

( Type definitions 3 ) 

( External subroutines 4 ) 

( Global variables 6 ) 

( Subroutines 7 ) 

2. Of course we include standard C library routines, and we set things up to 
accommodate older versions of C. 

( Preprocessor macros 2 ) = 

^include <stdio.h> 

T^include <stdlib.h> 

#ifdef __STDC__ 

^define ARGS(fot) list 
^else 

^define ARCS (list) () 

^endif 

#ifndef FILENAME_MAX 
#define FILENAME_MAX 256 
^endif 

#ifndef SEEK_SET 
#define SEEK_SET 0 
^endif 

#ifndef SEEK_END 
#define SEEK_END 2 
^endif 

This code is used in section 1. 

3. The unsigned 32-bit type tetra must agree with its definition in the simulators. 
( Type definitions 3 ) = 

typedef unsigned int tetra; 
typedef struct { 
tetra h, l\ 

} octa; /* two tetrabytes make one octab 3 de */ 

See also section 5. 

This code is used in section 1. 

4. Three basic subroutines are used to get strings from the simulated memory and 
to put strings into that memory. These subroutines are defined appropriately in each 
simulator. We also use a few subroutines and constants defined in MMIX-ARITH. 

( External subroutines 4 ) = 

extern char stdin.chr ARCS ((void)); 

extern int mmgetchars ARGS((char *buf, int size, octa addr, int stop))-, 
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extern void mmputchars ARGS ((unsigned char *buf, int size, octa addr)); 
extern octa oplus ARGS ((octa, octa)); 
extern octa ominus ARGS((octa, octa)); 
extern octa incr ARGS ((octa, int)); 

extern octa zero.octa-, /* zero-octa .h = zero-octa .1 — 0 */ 
extern octa neg.one; /* neg-one.h = neg.oned = — 1 */ 

This code is used in section 1. 

5. Each possible handle has a file pointer and a current mode. 

( Type definitions 3 ) += 

typedef struct { 

FILE *fp\ /* file pointer */ 

int mode; /* [read OK] + 2[write OK] + 4[binary] + 8[readwrite] */ 

} sim_file_info; 

6. (Global variables e) = 
sim_file_info s/i/e [256]; 

See also sections 9 and 24. 

This code is used in section 1. 

7. The first three handles are initially open. 

( Subroutines 7 ) = 

void mmix-io.init ARGS((void)); 
void mmixjioJ,nit[) 

{ 

s/ite[0]./p = stdin , sfile'^].mode = 1; 
sfile\i\.fp — stdout , sfile\l\.mode = 2; 
s/ite[2]./p = stderr , sfile[2].mode = 2; 

} 

See also sections 8, 10, 11, 12, 14, 16, 18, 19, 20, 21, 22, and 23. 

This code is used in section 1. 



STDC , Standard C. 

FILE, <stdio.h>. 
FILENAME_MAX = macro, 
<stdio.h>. 

incr\ octa (), mmix-ARITH §6. 
mmgetchars int (), 
mmix-Sim§114. 
mmgetchars-. int (), 
MMIX-PIPE §381. 
mmputchars: void (), 



MMIX-SIM §117. 
mmputchars: void (), 
MMIX-PIPE §384. 
neg^one: octa, mmix-ARITH §4. 
ominus: octa (), 

MMIX-ARITH §5. 

oplus: octa (), MMIX-ARITH §5. 
SEEK_END = macro, <stdio.h>. 
SEEK_SET = macro, <stdio.h>. 



stderr: FILE *, <stdio.h>. 
stdin: FILE *, <stdio.h>. 
stdin^chr: char (), 
MMIX-SIM §120. 
stdin^chr: char (), 
MMIX-PIPE §387. 
stdout: FILE *, <stdio.h>. 
zero.octa: octa, 
MMIX-ARITH §4. 
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8. The only tricky thing about these routines is that we want to protect the standard 
input, output, and error streams from being preempted. 

( Subroutines 7 ) += 

octa mmix.fopen ARCS ((unsigned char , octa, octa)); 
octa mmix.f open {handle, name, mode) 
unsigned char handle', 
octa name, mode-, 

{ 

char name_6it/[FILENAME_MAX]; 
if {mode.hV model > 4) goto abort; 

if (mmgetc/iars (name_6w/, FILENAME_MAX, nome, 0) = FILENAME_MAX) goto abort; 
if {sfile[handle].mode ^ 0 A handle >2) fclose{sfile[handle].fp); 
sfile[handle].fp = fopen{nameJ)uf , mode^string [model]); 
if {-'sfile[handle].fp) goto abort; 
sfile[handle].mode = mode.code[mode 1]; 
return zero.octa; /* success */ 
abort'. sfile[handle].mode = 0; 

return neg.one; /* failure */ 

} 

9. ( Global variables 6 } += 

char *modestring[] = {"r" , "w" , "rb" , "wb", "w+b"}; 
int mode.eode]] = {*1, *2, *5, *6, }; 

10. If the simulator is being used interactively, we can avoid competition for stdin 
by substituting another file. 

( Subroutines 7 ) += 

void mmix-fakestdin ARGS((FILE *)); 
void mmix-fake^stdin (/) 

FILE *f; 

{ 

sfile[0].fp = f; /* f should be open in mode "r" */ 

} 

11. (Subroutines 7) += 

octa mmix.fclose ARCS ((unsigned char)); 
octa mmixjclose {handle) 
unsigned char handle; 

{ 

if {sfile[handle].mode =0) return neg.one; 

if {handle > 2 A fclose{sfile[handle].fp) ^ 0) return neg.one; 

s file [handle], mode = 0; 

return zero-octa; /* success */ 

} 
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12. (Subroutines 7) += 

octa mmixjread ARCS ((unsigned char , octa, octa)); 
octa mmix-fread {handle , buffer, size) 
unsigned char handle', 
octa buffer, size; 

{ 

register unsigned char *buf; 
register int n; 
octa o; 

o = neg.one ; 

if {-^{sfile[handle].mode goto done; 

if {sfile[handle].mode &i *8) sfile[handle].mode &i= ^*2; 
if (size.h) goto done; 

buf = (unsigned char *) ca//oc(size. Z, sizeof(char)); 
if {-ibuf) goto done; 

(Read n < sized characters into buf 13 ); 
mmputchars{buf ,n, buffer); 
free (buf); 
o.h = 0, o.Z = n; 
done: return ominus{o, size); 

} 

13. (Read n < sized characters into buf 13 ) = 
if {sfile[handle].fp = stdin) { 

register unsigned char *p; 

for {p — buf ,n = sized; p < buf + n; p++) *p = stdin.chr{); 

} 

else { 

clearerr ( sfile [handle], fp ) ; 
n = fread{buf , 1, sized, sfile[handle\.fp); 
if {f error {sfile[handle].fp)) { 
free (buf); 
goto done; 

} 

} 

This code is used in section 12. 



ARCS = macro ( ), §2. 
calloc: void *(), <stdlib.h>. 
clearerr: void {), <stdio.h>. 
fclose: int (), <stdio.h>. 
ferror: int (), <stdio.h>. 
FILE, <stdio.h>. 
FILENAME_MAX = macro, 
<stdio.h>. 

fopen: FILE *(), <stdio.h>. 
fp: FILE §5. 
fread: size_t (), <stdio.h>. 
free: void (), <stdlib.h>. 



h: tetra, §3. 

1 : tetra, §3. 
mmgetchars: int (), 

MMIX-SIM §114. 
mmgetchars: int (), 
MMIX-PIPE §381. 
mmputchars: void (), 
MMIX-SIM §117. 
mmputchars: void (), 
MMIX-PIPE §384. 
mode: int, §5. 

neg^one: octa, mmix-ARITH §4. 



octa = struct, §3. 
ominus: octa (), 
MMIX-ARITH §5. 
sfile: sim_file_info [], §6. 
stdin: FILE *, <stdio.h>. 
stdin.chr: char (), 
MMIX-SIM §120. 
stdimchr: char (), 
MMIX-PIPE §387. 
zero.octa: octa, 
MMIX-ARITH §4. 
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14 . (Subroutines 7) += 

octa mmix.fgets ARCS ((unsigned char , octa, octa)); 
octa mmix.f gets {handle, bujfer, size) 
unsigned char handle', 
octa buffer, size; 

{ 

char 6 m/ [ 256]; 
register int n, s; 
register char *p; 
octa o; 
int eof — 0; 

if {^{sfile[handle].mode return neg.one; 

if {-•sized /\~< size. h) return ne(/_one; 
if {sfile[handle].mode sfile[handle].mode &z= '^*2; 

size = incr{size,—l); 
o = zero-octa; 
while (1) { 

(Read n < 256 characters into buf 15 ); 
mmpMtc/iars ((unsigned char *) buf ,n + 1, buffer); 
o = incr{o, n); 
size = incr {size , —n); 

if {{n /\ buf\n— 1] = ’\n’) V {-•sized A ^ size. h) V eof) return o; 
buffer = incr {buffer ,n); 

} 

} 

15 . ( Read n < 256 characters into buf 15 } = 
s = 255; 

if {size. I < s A -•size.h) s = sized; 
if {sfile[handle\.fp = stdin) 

for {p = buf , n = 0; n < s; ) { 

*p — stdin.chr { ); 
n++; 

if (*p++ = ’\n’) break; 

} 

else { 

if {-•fgets{buf , s + 1, sfile[handle].fp)) return neg.one; 
eof = feof {sfile [handle].fp ) ; 
for {p = buf , n = 0; n < s; ) { 
if {-^*pAeof) break; 

71 ++ ; 

if (*p++ = ’\n’) break; 

} 

} 

' \0 ^ ; 

This code is used in section 14. 
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16 . The routines that deal with wyde characters might need to be changed on a 
system that is little-endian; the author wishes good luck to whoever has to do this. 
MMIX is always big-endian, but external files prepared on random operating systems 
might be backwards. 

( Subroutines 7 ) -|-= 

octa mmix-fgetws ARCS ((unsigned char , octa, octa)); 
octa mmixjgetws (handle, bujfer, size) 
unsigned char handle', 
octa bujfer, size; 



char 6m/ [256]; 
register int n, s; 
register char *p; 
octa o; 

int eof = 0; 

if {-n(sfile[handle].mode return neg.one; 

if {-n size. I A -'Size. h) return neg.one; 

if (sfile[handle].mode &z*8) sfile[handle].mode &z= ^*2; 

bujfer . I &= —2; 

size = incr (size , —1); 

o = zero-octa; 

while (1) { 

(Read n < 128 wyde characters into buf 17); 
mmpMtc/iars ((unsigned char *) buf , 2 * n + 2, bujfer); 
o = incr(o, n); 
size = mcr(size,—n); 

if ((n A 6m/[2 * n — 1] = ’\n’ A 6m/[2 * n — 2] = 0) V (-tsize.l A ->size.h) V eof) 



{ 



return o; 

buffer = incr (buffer , 2* n); 



} 



} 



ARCS = macro ( ), §2. 
feof: int (), <stdio.h>. 
fqets: char *(), <stdio.h>. 
fp: FILE §5. 
h: tetra, §3. 

incr: octa (), mmix-ARITH §6. 



MMIX-PIPE §384. 
mode: int, §5. 

neg^one: octa, mmix-ARITH §4. 

octa = struct, §3. 

sfile: sim_file_info [], §6. 



MMIX-SIM §117. 
mmputchars: void (), 



stdin: FILE *, <stdio.h>. 
stdin.chr: char (), 



MMIX-PIPE §387. 
zero.octa: octa, 



stdimchr: char (), 



MMIX-SIM §120. 



1 : tetra, §3. 



MMIX-ARITH §4. 



mmputchars: void (), 
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17 . (Read n < 128 wyde characters into buf 17) = 
s = 127; 

if {size. I < s A -<size.h) s = size.l\ 
if {sfile[handle].fp = stdin) 

for {p = buf , n = 0; n < s; ) { 

*p++ = stdin.chr{)\ *p++ = stdin.chr{)\ 
n++; 

if (*(p — 1) = ’ \n’ A *{p — 2) = 0) break; 

} 

else 

for {p = buf , n = Q\ n < s\ ) { 

if {fread{p,l,2, sfile[handle].fp) ^ 2) { 
eof = feof {sfile [handle]. fp ) ; 
if {-<eof) return neg.one; 
break; 

} 

n++,p += 2 ; 

if {*{p — 1) = ’ \n’ A *{p — 2) = 0) break; 

} 

*p = *{p + 1) = ’\0’ ; 

This code is used in section 16. 

18 . (Subroutines 7} += 

octa mmix-fwrite ARCS ((unsigned char , octa, octa)); 
octa mmix.f write {handle, buffer, size) 
unsigned char handle; 
octa buffer, size; 

{ 

char buf [256]; 

register int n; 

if {^{sfile[handle].mode &i*2)) return ominus {zero -Octa, size); 
if {sfile[handle].mode &L*S) sfile[handle[.mode 1; 

while (1) { 

if {size.h V size. I > 256) n = mmgetchars {buf , 256, buffer, —1); 
else n = mmgetchars {buf , size.l, buffer, —1); 
size = incr{size,—n); 

if {fwnte{buf ,l,n, sfile[handle].fp) ^ n) return ominus {zero -Octa, size); 

fflush {sfile [handle] .fp ) ; 

if {-! size. I A -•size.h) return zero.octa; 

buffer = incr {buffer ,n); 

} 

} 

19 . (Subroutines 7) += 

octa mmixjputs ARCS ((unsigned char, octa)); 
octa mmixjputs {handle, string) 
unsigned char handle; 
octa string; 



{ 
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char 6tt/[256]; 
register int n; 
octa o; 

o = zero-octa ; 

if {-n[sfile[handle].mode *2)) return neg.one; 
if {sfile[handle].mode &i*8) sfile[handle].mode &i= 1~, 

while (1) { 

n = mmgetchars{buf ,256, string ,0); 

if {f write {buf ,l,n, sfile[handle].fp) ^ n) return neg-one; 
o = incr{o, n); 
if (n < 256) { 

Jflush ( sfile [handle] .fp ) ; 

return o; 

} 

string = incr (string ,n); 

} 

} 

20. (Subroutines 7} += 

octa mmixjputws ARCS ((unsigned char, octa)); 
octa mmixjputws (handle , string) 
unsigned char handle', 
octa string; 

{ 

char 6 m/ [ 256]; 
register int n; 
octa o; 

o = zero. octa; 

if (-i(sfile[handle].mode &z*2)) return neg.one; 
if (sfile[handle].mode *8) sfile[handle].mode &z= ^*1; 
while (1) { 

n = mmgetchars (buf , 256, string, 1); 

if (f write (buf ,l,n, sfile[handle].fp) ^ n) return neg.one; 
o = mcr(o, n 3> 1); 
if (n < 256) { 

ffiush ( sfile [handle] .fp ) ; 

return o; 

} 

string — incr (string ,n); 

} 



ARCS = macro ( ), §2. 

buf: char [], §16. 

eof: int, §16. 

feof: int (), <stdio.h>. 

ffiush: int (), <stdio.h>. 

fp: FILE §5. 

fread: size_t (), <stdio.h>. 

fwrite: size.t (), <stdio.h>. 

h: tetra, §3. 

handle: unsigned char, §16. 
incr: octa (), mmix-ARITH §6. 



1 : tetra, §3. 
mmgetchars: int (), 

MMIX-SIM §114. 
mmgetchars: int (), 
MMIX-PIPE §381. 
mode: int, §5. 
n: register int, §16. 
neg^one: octa, mmix-ARITH §4. 
octa = struct, §3. 
ominus: octa (), 

MMIX-ARITH §5. 



p: register char *, §16. 
s: register int, §16. 
sfile: sim_file_info [], §6. 
size: octa, §16. 
stdin: FILE *, <stdio.h>. 
stdimchr: char (), 
MMIX-SIM §120. 
stdimchr: char (), 
MMIX-PIPE §387. 
zero.octa: octa, 
MMIX-ARITH §4. 
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21. :j5^define signJjit ((unsigned) *80000000) 

( Subroutines 7 ) += 

octa mmix.fseek ARCS ((unsigned char,octa)); 
octa mmixjseek (handle, offset) 
unsigned char handle', 
octa offset', 

{ 

if (-i(sfile\handle\.mode return neg.one-, 

if (sfile[handle].mode sfile[handle].mode = 

if (offset .h sign J)it) { 

if (off set .h ^ * Ifififif y ^(offset . I &L signJoit)) return neg.one', 
if (fseek (sfile[handle].fp , (int) offset .1 + l,SEEK_ENIl) ^ 0) return neg.one', 

} else { 

if (offset .h V (offset . I sign.bit)) return neg.one; 

if (fseek (sfile[handle].fp , (int) o ff set. I, SEEK_SET ) ^ 0) return neg.one', 

} 

return zero-octa-, 

} 

22. (Subroutines 7) += 

octa mmixfftell ARGS( (unsigned char)); 
octa mmixff tell (handle) 

unsigned char handle; 

{ 

register long x; 
octa o; 

if (^(sfile[handle].mode &i*4)) return neg.one; 

X = ftell(sfile[handle].fp); 
if (x < 0) return neg.one; 
o.h = 0, o.l = x; 

return o; 

} 

23. One last subroutine belongs here, just in case the user has modified the standard 
error handle. 

( Subroutines 7 ) += 

void prmtJ.ripjwarning ARCS ((int, octa)); 
void printJ.ripjwarning(n, loc) 

int n; 
octa loc; 

{ 

if (s/i/e [2]. mode & *2) fprintf (sfile\i\.fp , "Warning : u°/.Suatulocationu’/.08x’/.08x\n" , 
trip.warning[n], loc.h, loc.l); 

} 

24. ( Global variables 6 ) += 

char *trip^waming[] — {"TRIP" , "integerudivideucheck" , "integeruoverflow" , 

"f loat-to-f ixuoverf low" , "invaliduf loatingupointuoperation" , 

"f loatingupointuoverf low" , "f loatingupointuunderf low" , 

"f loatingupointudivisionubyuzero" , "f loatingupointuinexact " }; 
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25. Names of the sections. 

( External subroutines 4) Used in section 1 . 

( Global variables 6 , 9, 24) Used in section 1 . 

( Preprocessor macros 2 ) Used in section 1 . 

(Read n < 128 wyde characters into buf 17) Used in section 16 . 

( Read n < 256 characters into buf 15 ) Used in section 14 . 

(Read n < size. I characters into buf 13) Used in section 12 . 
(Subroutines 7 , S, 10, 11, 12, 14, I6, is, 19, 20, 21, 22, 23) Used in section 1 . 
(Type definitions 3 , 5) Used in section 1 . 



ARCS = macro ( ), §2. 
fp: FILE §5. 
fprintf: int (), <stdio.h>, 
fseek: int (), <stdio.h>. 
ftell: long (), <stdio.h>. 



h: tetra, §3. 

1: tetra, §3. 
mode: int, §5. 

neg^one: octa, mmix-ARITH §4. 
octa = struct, §3. 



SEEK_END = macro, <stdio.h>. 
SEEK_SET = macro, <stdio.h>. 
sfile: sim_file_info [], §6. 
zero.octa: octa, 

MMIX-ARITH §4. 



MMIX-MEM 

1. Memory- mapped input and output. This module supplies procedures for 
reading from and writing to MMIX memory addresses that exceed 48 bits. Such 
addresses are used by the operating system for input and output, so they require 
special treatment. At present only dummy versions of these routines are implemented. 
Users who need nontrivial versions of spec-read and/or spec-write should prepare their 
own and link them with the rest of the simulator. 

Many I/O devices communicate via bytes or wydes or tetras instead of octabytes. 
So these prototype routines have a size parameter, to distinguish between the various 
kinds of quantities that MMIX wants to read from and write to the memory-mapped 
addresses. 

T^include <stdio.h> 

^include "mmix-pipe.h" /* header file for all modules */ 
extern octa readjiex{)\ /* found in the main program module */ 
static char buf [20]; 

static char *kind[] — {"byte", "wyde", "tetra", "octa"}; 

extern octa shift Jeft ARCS ((octa y, int s)); /* y^s,0<s<64 */ 

extern octa shift.right ARGS((octa y, int s, int u)); /* y ^ s, signed if -iu */ 

2. If the interactive-read.bit of the verbose control is set, the user is supposed to 
supply values dynamically. Otherwise zero is read. 

octa spec^read ARCS ((octa, int)); 
octa spec^read {addr , size) 
octa addr; 
int size; 

{ 

octa val; 

size &= *3, addr . I &= —(1 <C size); 
if (verbose & interactive.read.bit) { 

print/("**uReaduy.Sufromulocuy.08x°/,08x:u" , kind[size], addr.h, addr .1); 
fgets(buf , 20, stdin); 
val = readjiex(buf); 

} 

else val. I = val.h — 0; 
switch (size) { 
case 0: val.l &= *ff; 
case 1: val.l &= 
case 2: val.h = 0; 
case 3: break; 

} 

if (verbose & show^spec-bit) { 
printf ( " uuu ( spec_r eadu " ) ; 
switch (size) { 

case 0: printf ("‘/,02x" , val .1); break; 
case 1: printf ("‘/,Q^x" , val .1); break; 
case 2: printf ("7,08x" , val .1); break; 
case 3: printf ("7,08x7,08x" , val .h, val.l); break; 
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} 

pnni/("ufromu’/.08x‘/,08xuatutimeuy.d)\n" , addr.h, addr.l, ticks. 1); 

} 

return shiftJeft{val , (8 — (1 <C size) — {addr.l & 7)) <C 3); 

} 

3. The default spec-write just reports its arguments, without actually writing any- 
thing. 

void spec-write ARGS ((octa, octa, int)); 

void spec-write{addr ,val, size) 
octa addr, val\ 
int size-, 

{ 

if {verbose & showspec-bit) { 

size &= *3, addr.l &= —(1 ^ size); 

val — shiftjright{val , (8 — (1 ^ size) — {addr.l & 7)) ^ 3, 1); 
pnnt/("uuu(spec_writeu" ); 
switch {size) { 

case 0: printf {"°/,02x" , val .1); break; 
case 1: print/ {"7,04:x" , val .1); break; 
case 2: print/ {"7,08x" , val .1); break; 
case 3: print/ {"7,08x7,08x" , val .h, val. 1); break; 

} 

pnnt/("utOu"/o08x7«08xuatutimeu°/od)\n" , addr .h, addr.l, ticks. 1); 

} 

} 

4. Incidentally, the combined address a and size s could be transmitted in 64 bits 
of an actual memory bus, because a is always a multiple of 2^ that is less than 2^^. 
Thus (a, s) can be packed neatly into the 64-bit number 2a -t- 2^. (Think about it.) 



ARCS = macro, mmix-pipe §6. 
/gets: char *(), <stdio.h>. 
h: tetra, mmix-pipe §17. 
interactivejread^bit = 1 5, 

MMIX-PIPE §8. 

h tetra, mmix-pipe §17. 
octa = struct, mmix-pipe §17. 



printf: int (), <stdio.h>. 
read.hex: octa (), mmmix§17. 
shiftJeft: octa (), 
MMIX-ARITH §7. 
shift.right: octa (), 
MMIX-ARITH §7. 



show.spec.hit = 1 <C 6, 
MMIX-PIPE §8. 
stdin: FILE *, <stdio.h>. 
ticks: Extern octa, 
MMIX-PIPE §87. 
verbose: int, mmix-pipe §4. 



MMIX-PIPE 

1. Introduction. This program is the heart of the meta-simulator for the ultra- 
configurable MMIX pipeline: It defines the MMIX.run routine, which does most of the 
work. Another routine, MMIXJnit, is also defined here, and so is a header file called 
mmix_pipe . h. The header file is used by the main routine and by other routines like 
MMIX.config , which are compiled separately. 

Readers of this program should be familiar with the explanation of MMIX architec- 
ture as presented in the main program module for MMMIX. 

A lot of subtle things can happen when instructions are executed in parallel. 
Therefore this simulator ranks among the most interesting and instructive programs 
in the author’s experience. The author has tried his best to make everything correct 
. . . but the chances for error are great. Anyone who discovers a bug is therefore 
urged to report it as soon as possible; please see http://mmix.cs.hm.edu/bugs/ for 
instructions. 

It sort of boggles the mind when one realizes that the present program might 
someday be translated by a C compiler for MMIX and used to simulate itself. 

2. This high-performance prototype of MMIX achieves its efficiency by means of 
“pipelining,” a technique of overlapping that is explained for the related DLX computer 
in Chapter 3 of Hennessy & Patterson’s book Computer Architecture (second edition). 
Other techniques such as “dynamic scheduling” and “multiple issue,” explained in 
Chapter 4 of that book, are used too. 

One good way to visualize the procedure is to imagine that somebody has organized 
a high-tech car repair shop according to similar principles. There are eight indepen- 
dent functional units, which we can think of as eight groups of auto mechanics, each 
specializing in a particular task; each group has its own workspace with room to deal 
with one car at a time. Group F (the “fetch” group) is in charge of rounding up 
customers and getting them to enter the assembly-line garage in an orderly fashion. 
Group D (the “decode and dispatch” group) does the initial vehicle inspection and 
writes up an order that explains what kind of servicing is required. The vehicles go 
next to one of the four “execution” groups: Group X handles routine maintenance, 
while groups XF, XM, and XD are specialists in more complex tasks that tend to take 
longer. (The XF people are good at floating the points, while the XM and XD groups 
are experts in multilink suspensions and differentials.) When the relevant X group 
has finished its work, cars drive to M station, where they send or receive messages and 
possibly pay money to members of the “memory” group. Finally all necessary parts 
are installed by members of group W, the “write” group, and the car leaves the shop. 
Everything is tightly organized so that in most cases the cars move in synchronized 
fashion from station to station, at regular 100-nanocentury intervals. 

In a similar way, most MMIX instructions can be handled in a five-stage pipeline, 
F-D-X-M-W, with X replaced by XF for floating-point addition or conversion, or by 
XM for multiplication, or by XD for division or square root. Each stage ideally takes 
one clock cycle, although XF, XM, and (especially) XD are slower. If the instructions 
enter in a suitable pattern, we might see one instruction being fetched, another being 
decoded, and up to four being executed, while another is accessing memory, and yet 
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another is finishing up by writing new information into registers; all this is going on 
simultaneously during one clock cycle. Pipelining with eight separate stages might 
therefore make the machine run up to 8 times as fast as it could if each instruction 
were being dealt with individually and without overlap. (Well, perfect speedup turns 
out to be impossible, because of the shared M and W stages; the theory of knapsack 
programming, to be discussed in Section 7.7 of The Art of Computer Programming, 
tells us that the maximal achievable speedup is at most 8 — 1/p— 1/q— 1/r when XF, 
XM, and XD have delays bounded by p, q, and r cycles. But we can achieve a factor 
of more than 7 if we are very lucky.) 

Consider, for example, the ADD instruction. This instruction enters the computer’s 
processing unit in F stage, taking only one clock cycle if it is in the cache of instructions 
recently seen. Then the D stage recognizes the command as an ADD and acquires the 
current values of $Y and $Z; meanwhile, of course, another instruction is being fetched 
by F. On the next clock cycle, the X stage adds the values together. This prepares the 
way for the M stage to watch for overflow and to get ready for any exceptional action 
that might be needed with respect to the settings of special register rA. Finally, on 
the fifth clock cycle, the sum is either written into $X or the trip handler for integer 
overflow is invoked. Although this process has taken five clock cycles (that is, 5t>), 
the net increase in running time has been only Iv. 

Of course congestion can occur, inside a computer as in a repair shop. For example, 
auto parts might not be readily available; or a car might have to sit in D station while 
waiting to move to XM, thereby blocking somebody else from moving from F to D. 
Sometimes there won’t necessarily be a steady stream of customers. In such cases the 
employees in some parts of the shop will occasionally be idle. But we assume that 
they always do their jobs as fast as possible, given the sequence of customers that 
they encounter. With a clever person setting up appointments — translation: with 
a clever programmer and/or compiler arranging MMIX instructions — the organization 
can often be expected to run at nearly peak capacity. 

In fact, this program is designed for experiments with many kinds of pipelines, po- 
tentially using additional functional units (such as several independent X groups) , and 
potentially fetching, dispatching, and executing several nonconflicting instructions si- 
multaneously. Such complications make this program more difficult than a simple 
pipeline simulator would be, but they also make it a lot more instructive because we 
can get a better understanding of the issues involved if we are required to treat them 
in greater generality. 



MMIX.config: void (), 
MMIX-CONFIG §38. 



MMIX.init'. void (), §10. 



MMIX.run: void (), §10. 
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3. Here’s the overall structure of the present program module. 

^include <stdio.h> 

T^include <stdlib.h> 

^include <math.h> 

T^include "abstime.h" 

( Preprocessor definitions ) 

( Header definitions 6 } 

( Type definitions ll ) 

( Global variables 20 ) 

( External variables 4 ) 

( Internal prototypes 13 } 

( External prototypes 9 } 

( Subroutines 14 ) 

( External routines 10 ) 

4. The identifier Extern is used in MMIX-PIPE to declare variables that are ac- 
cessed in other modules. Actually all appearances of ‘Extern’ are defined to be blank 
here, but ‘Extern’ will become ‘extern’ in the header file. 

^define Extern /* blank for us, extern for them */ 
format Extern extern 

{ External variables 4 ) = 

Extern int verbose-, /* controls the level of diagnostic output */ 

See also sections 29, 59, 60, 66, 69, 77, 86, 87, 98, 115, 136, 150, 168, 207, 211, 214, 242, 247, 284, 
and 349. 

This code is used in sections 3 and 5. 

5. The header file repeats the basic definitions and declarations. 

{ mmix-pipe . h 5) = 

^define Extern extern 
( Header definitions 6 ) 

( Type definitions ll ) 

( External variables 4 ) 

( External prototypes 9 } 

6. Subroutines of this program are declared hrst with a prototype, as in ANSI C, 
then with an old-style C function definition. The following preprocessor commands 
make this work correctly with both new-style and old-style compilers. 

( Header definitions 6 ) = 

#ifdef __STDC__ 

^define ARGS(fot) list 
^else 

^define ARCS (fot) () 

^endif 

See also sections 7, 8, 52, 57, 129, and 166. 

This code is used in sections 3 and 5. 
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7 . Some of the names that are natural for this program are in conflict with library 
names on at least one of the host computers in the author’s tests. So we bypass the 
library names here. 

( Header definitions 6 } += 

^define random myjrandom 
^define fsqrt myjsqrt 
^define div my.div 

8. The amount of verbosity depends on the following bit codes. 

( Header definitions 6 ) += 

^define issue.bit (1 0) 

/* show control blocks when issued, deissued, committed */ 

^define pipeMt (1 <C 1) /* show the pipeline and locks on every cycle */ 

^define coroutine.bit (1 <C 2) /* show the coroutines when started on every cycle */ 

^define schedule J>it (1 <C 3) /* show the coroutines when scheduled */ 

^define uninitjmemJjit (1 <C 4) 

/ * complain when reading from an uninitialized chunk of memory * / 

^define interactive-read J>it (1^5) 

/* prompt user when reading from I/O location */ 

^define showspec-bit (1 <C 6) 

/* display special read/write transactions as they happen */ 

^define show-pred-bit (1 7) /* display branch prediction details */ 

^define show-wholecache-bit (1^8) 

/* display cache blocks even when their key tag is invalid */ 

9. The MMIX-init{) routine should be called exactly once, after MMIX-config{) 
has done its work but before the simulator starts to execute any programs. Then 
MMIX-run{ ) can be called as often as the user likes. 

( External prototypes 9 } = 

Extern void MMIX-init ARGS((void)); 

Extern void MMIX-run ARGS((int eyes, octa breakpoint)); 

See also sections 38, 161, 175, 178, 180, 209, 212, and 252. 

This code is used in sections 3 and 5. 



STDC , Standard C. 

MMIX-Config : void ( ) , 



MMIX-CONFIG §38. 
MMIX-init'. void (), §10. 



MMIX-run: void (), §10. 
octa = struct, §17. 
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10 . ( External routines 10 ) = 
void MMIXAnit ( ) 

{ 

register int i, j\ 

{ Initialize everything 22 ) ; 

} 

void MMIXjrun[cycs , breakpoint) 
int cycs\ 
octa breakpoint', 

{ 

( Local variables 12); 
while (eyes) { 

if (verbose & (issue.bit \ pipe-bit \ coroutine-bit \ schedule-bit)) 
prmt/("***uCycleu’/od\n" , ticks.l); 

{ Perform one machine cycle 64 ) ; 
if (verbose &C. pipe-bit) { 
print-pipe ( ) ; print-locks ( ) ; 

} 

if (breakpoint-hit V halted) { 
if (breakpoint-hit) 

print/ ("Breakpointuinstruct ionuf etcheduatutimeu"/«d\n" , ticks.l — 1 ); 
if (halted) print/ ("Halteduatutimeu’/odXn" , trcfcs.Z — 1 ); 
break; 

} 

eyes — ; 

} 

cease : ; 

} 

See also sections 39, 162, 176, 179, 181, 210, 213, and 253. 

This code is used in section 3. 

11 . (Type definitions 11) = 
typedef enum { 

false , true , wow 

} bool; /* slightly extended booleans */ 

See also sections 17, 23, 37, 40, 44, 68, 76, 164, 167, 206, 246, and 371. 

This code is used in sections 3 and 5. 

12 . (Local variables 12) = 
register int i, j, m; 
bool breakpoint-hit — false ; 
bool halted = false ; 

See also sections 124 and 258. 

This code is used in section 10. 
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13 . Error messages that abort this program are called panic messages. The macro 
called confusion will never be needed unless this program is internally inconsistent. 

95^:deflne errprintO {f) fprintf {stderr , f) 

T^tdeflne errprintl (/, a) fprintf {stderr , f, a) 

^define errprint2{f,a,b) fprintf {stderr , f , a, b) 

:^deflne panic{x) { errprintO {"Panic lu")', errprintO {" \\n" ); expire{)\ } 

T^deflne confusion{m) errprinti ("Thisucan’ tuhappen: u’/s" , m) 

(Internal prototypes 13) = 

static void expire ARCS ((void)); 

See also sections 18, 24, 27, 30, 32, 34, 42, 45, 55, 62, 72, 90, 92, 94, 96, 156, 158, 169, 171, 173, 182, 
184, 186, 188, 190, 192, 195, 198, 200, 202, 204, 240, 250, 254, and 377. 

This code is used in section 3. 

14 . (Subroutines 14) = 

static void expire {) /* the last gasp before dying * / 

{ 

if {ticks .h) errprint2 {" (Clockutimeuisu’/.dH+’/.d. )\n" , ticks. h, ticks. 1); 
else errprintl (" (Clockutimeuisu7.d. ) \n" , ticfes.l); 
exit {—2)-, 

} 

See also sections 19, 21, 25, 28, 31, 33, 35, 43, 46, 56, 63, 73, 91, 93, 95, 97, 157, 159, 170, 172, 174, 
183, 185, 187, 189, 191, 193, 196, 199, 201, 203, 205, 208, 241, 251, 255, 378, 379, 381, 384, 
and 387. 

This code is used in section 3. 

15 . The data structures of this program are not precisely equivalent to logical 
gates that could be implemented directly in silicon; we will use data structures 
and algorithms appropriate to the C programming language. For example, we’ll use 
pointers and arrays, instead of buses and ports and latches. However, the net effect 
of our data structures and algorithms is intended to be equivalent to the net effect of 
a silicon implementation. The methods used below are essentially equivalent to those 
used in real machines today, except that diagnostic facilities are added so that we can 
readily watch what is happening. 

Each functional unit in the MMIX pipeline is programmed here as a coroutine in C. 
At every clock cycle, we will call on each active coroutine to do one phase of its 
operation; in terms of the repair-station analogy described in the main program, this 
corresponds to getting each group of auto mechanics to do one unit of operation on a 
car. The coroutines are performed sequentially, although a real pipeline would have 
them act in parallel. We will not “cheat” by letting one coroutine access a value early 
in its cycle that another one computes late in its cycle, unless computer hardware 
could “cheat” in an equivalent way. 



ARCS = macro, §6. 
coroutine^bit = 1 <C 2, §8. 
exit: void (), <stdlib.h>. 
fprintf: int (), <stdio.h>. 
h: tetra, §17. 
issuej)it = 1 <C 0, §8. 



1: tetra, §17. 
octa = struct, §17. 

pipe^bit = 1 <C 1, §8. 
printJocks: void (), §39. 
printjpipe: void (), §253. 



printf: int (), <stdio.h>. 
schedule-bit = 1 3, §8. 

stderr: FILE =t=, <stdio.h>. 
ticks: Extern octa, §87. 
verbose: int, §4. 
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16. Low-level routines. Where should we begin? It is tempting to start with a 
global view of the simulator and then to break it down into component parts. But 
that task is too daunting, because there are so many unknowns about what basic 
ingredients ought to be combined when we construct the larger components. So let us 
look first at the primitive operations on which the superstructure will be built. Once 
we have created some infrastructure, we’ll be able to proceed with confidence to the 
larger tasks ahead. 

17. This program for the 64-bit MMIX architecture is based on 32-bit integer arith- 
metic, because nearly every computer available to the author at the time of writing 
(1998-1999) was limited in that way. Details of the basic arithmetic appear in a sepa- 
rate program module called MMIX-ARITH, because the same routines are needed also 
for the assembler and for the non-pipelined simulator. The definition of type tetra 
should be changed, if necessary, to conform with the definitions found there. 

(Type definitions ll) += 

typedef unsigned int tetra; 

/* for systems conforming to the LP-64 data model */ 
typedef struct { 
tetra h, l\ 

} octa; /* two tetrabytes make one octabyte */ 

18. ( Internal prototypes 13 ) += 
static void print.octa ARCS ((octa)); 

19. ( Subrontines 14) -|-= 
static void print.octa{o) 

octa o; 

{ 

if (o.h) printf {"‘/,x‘/,Q8x" , o.h, o.l); else printf {"‘/,x" , o.l); 

} 

20. (Global variables 20 ) = 

extern octa zero-Octa\ /* zero-octa .h = zero-octa .1 = 0 */ 

extern octa neg.one; /* neg.one.h = neg.one.l = — 1 */ 

extern octa aux-, /* anxiliary ontpnt of a snbroutine */ 

extern bool overflow, /* set by certain subroutines for signed arithmetic */ 

extern int exceptions', /* bits set by floating point operations */ 

extern int curjround', /* the current rounding mode */ 

See also sections 36, 41, 48, 50, 51, 53, 54, 65, 70, 78, 83, 88, 99, 107, 127, 148, 154, 194, 230, 235, 

238, 248, 285, 303, 305, 315, 374, 376, and 388. 

This code is used in section 3. 

21. Most of the subroutines in MMIX-ARITH return an octabyte as a function 
of two octabytes; for example, oplus{y,z) returns the sum of octabytes y and z. 
Multiplication returns the high half of a product in the global variable aux', division 
returns the remainder in aux. 

{ Subroutines 14 } -|-= 

extern octa oplus ARGS((octa y,octa a)); /* unsigned y-\- z *j 

extern octa ominus ARGS((octa y, octa 2 )); /* unsigned y — z 
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v{z) */ 

/* half of BDIF */ 
/* halfofWDIF */ 



extern octa incr ARGS((octa i/, int delta)); /* unsigned y 5 {S is signed) */ 
extern octa oand ARGS((octa t/,octa z)); /* y!\z */ 

extern octa oandn ARGS((octa y,octa z)); /* yAz */ 

extern octa shiftjeft ARGS((octa y, int s)); /* j/ ^ s, 0 < s < 64 */ 

extern octa shiftjnght ARGS((octa i/, int s, int u)); /* yi$>s, signed if -^u */ 

extern octa omult ARGS((octa j/, octa z)); /* unsigned (aux,x) = y x z */ 

extern octa signed.omult ARGS((octa j/, octa z)); 

/* signed x = y x z, setting overflow */ 
extern octa odiv ARGS((octa a;, octa t/, octa 2)); 

/* unsigned {x,y)jz; aux = (x,y) mod 2 */ 
extern octa signed^odiv ARGS((octa y,octa 2)); 

/* signed y/z, when z 0 ; aux = y mod 2 */ 
extern int count J)its ARGS((tetra 2)); /*x = 

extern tetra byte.diff ARGS((tetra 1/, tetra 2)); 
extern tetra wyde.diff ARGS((tetra y, tetra 2)); 
extern octa booLmult ARGS((octa y,octa 2, bool xor)); /* MOR or MXOR */ 
extern octa load.sf ARGS( (tetra 2)); /* load short float */ 

extern tetra storesf ARGS((octa x)); /* store short float */ 

extern octa fplus ARGS((octa y,octa 2)); /* floating point x = y (B z */ 

extern octa fmult ARGS((octa y, octa 2)); /* floating point x = y ® z */ 

extern octa /divide ARGS((octa t/, octa 2)); /* floating point x = y 0 z */ 

extern octa froot ARGS((octa, int)); /* floating point x = \fz *j 
extern octa fremstep ARGS((octa i/, octa 2, int delta)); 

/* floating point a;rem2 = yremz */ 

extern octa fintegerize ARGS((octa 2, int mode)); /* floating point x = round(2) */ 
extern int fcomp ARGS((octa y,octa 2)); 

/* - 1 , 0 , 1 , or 2 if y < 2, 1/ = 2, J/ > 2, 2/ II 2 */ 
extern int fepscomp ARGS((octa y,octa 2, octa eps /int sim))-, 

X = siml [y ^ z (e)] : \y ^ z (e)] */ 
extern octa floatit ARGS((octa 2:, int mode, int unsgnd, int shrt)); 

/* fix to float * / 

extern octa fixit ARGS((octa 2:, int mode))', /* float to fix */ 



ARCS = macro, §6. 
aux: octa, MMIX-ARITH §4. 
bool = enum, §11. 
booLmult: octa (), 
MMIX-ARITH §29. 
byte.diff: tetra (), 
MMIX-ARITH §27. 
counLbits: int (), 

MMIX-ARITH §26. 
curjTound: int, 

MMIX-ARITH §30. 
exceptions: int, 

MMIX-ARITH §32. 
fcomp: int (), mmix-ARITH §85. 
f divide: octa (), 

MMIX-ARITH §44. 
fepscomp: int (), 

MMIX-ARITH §50. 
fintegerize: octa (), 
MMIX-ARITH §86. 
fixit: octa (), MMIX-ARITH §88. 



floatit: octa (), 

MMIX-ARITH §89. 
fmult: octa (), 

MMIX-ARITH §41. 
fplus: octa (), 

MMIX-ARITH §46. 
fremstep: octa (), 

MMIX-ARITH §93. 
froot: octa (), 

MMIX-ARITH §91. 
incr: octa (), mmix-arith §6. 
load^sf: octa {), 

MMIX-ARITH §39. 
neg^one: octa, mmix-ARITH §4. 
oand: octa (), 

MMIX-ARITH §25. 
oandn: octa (), 

MMIX-ARITH §25. 
odiv: octa (), MMIX-ARITH §13. 
ominus: octa (), 

MMIX-ARITH §5. 



omult: octa (), 

MMIX-ARITH §8. 

oplus: octa (), MMIX-ARITH §5. 
overflow: bool, 

MMIX-ARITH §4. 
printf: int (), <stdio.h>. 
shiftjeft: octa (), 

MMIX-ARITH §7. 
shiftjright: octa (), 
MMIX-ARITH §7. 
signed^odiv: octa (), 
MMIX-ARITH §24. 
signed^omult : octa (), 
MMIX-ARITH §12. 
store.sf: tetra (), 

MMIX-ARITH §40. 
wyde.diff: tetra (), 
MMIX-ARITH §28. 
zero.octa: octa, 

MMIX-ARITH §4. 
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22 . We had better check that our 32-bit assumption holds. 

( Initialize everything 22 ) = 

if {shiftJeft{neg-one, l).h 7 ^ ) 

panic ( errprintO ( " Incorrectuimplement at ionuof utypeutetra" )) ; 
See also sections 26, 61, 71, 79, 89, 116, 128, 153, 231, 236, 249, and 286. 
This code is used in section 10. 
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23. Coroutines. As stated earlier, this program can be regarded as a system 
of interacting coroutines. Coroutines — sometimes called threads — are more or less 
independent processes that share and pass data and control back and forth. They 
correspond to the individual workers in an organization. 

We don’t need the full power of recursive coroutines, in which new threads are 
spawned dynamically and have independent stacks for computation; we are, after all, 
simulating a fixed piece of hardware. The total number of coroutines we deal with is 
established once and for all by the MMIX.config routine, and each coroutine has a 
fixed amount of local data. 

The simulation operates one clock tick at a time, by executing all coroutines 
scheduled for time t before advancing to time t + 1. The coroutines at time t may 
decide to become dormant or they may reschedule themselves and/or other coroutines 
for future times. 

Each coroutine has a symbolic name for diagnostic purposes (e.g., ALUl); a non- 
negative stage number (e.g., 2 for the second stage of a pipeline); a pointer to the 
next coroutine scheduled at the same time (or A if the coroutine is unscheduled); a 
pointer to a lock variable (or A if no lock is currently relevant); and a reference to a 
control block containing the data to be processed. 

( Type definitions ll ) -|-= 

typedef struct coroutine_struct { 

char *name\ /* symbolic identification of a coroutine */ 
int stage', /* its rank */ 

struct coroutine_struct *next', /* its successor */ 

struct coroutine.struct **lockloc, /* what it might be locking */ 

struct control.struct *ctl; j* its data */ 

} coroutine; 

24. ( Internal prototypes 13 ) -|-= 

static void prinCcoroutine-id ARCS ((coroutine *)); 
static void errprinCcoroutine.id ARCS ((coroutine *)); 



ARGS=macro, §6. MMIX.config: void (), panic =macro (), §13. 

controhstruct: struct, §44. MMIX-CONFIG §38. shiftjeft: octa (), 

errprintO = macro (), §13. neg-one: octa, MMIX-ARITH §4. MMIX-ARITH §7. 

h: tetra, §17. 
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25. (Subroutines 14) += 
static void prinCcoroutineAd{c) 

coroutine *c; 

{ 

if (c) printf {"7,s:‘/,d" , c->name, c^stage); 
else printf ("??"); 

} 

static void errprinRcoroutine_id{c) 

coroutine *c\ 

{ 

if (c) errprint2{"7,s:7,d" ,c-name,c-stage)’, 
else errprintO {"??"); 

} 

26. Coroutine control is masterminded by a ring of queues, one each for times t, 
t + 1, . . . , t + ring size — 1, when t is the current clock time. 

All scheduling is first-come-first-served, except that coroutines with higher stage 
numbers have priority. We want to process the later stages of a pipeline first, in this 
sequential implementation, for the same reason that a car must drive from M station 
into W station before another car can enter M station. 

Each queue is a circular list of coroutine nodes, linked together by their next 
fields. A list head h with stage = maxstage comes at the end and the beginning 
of the queue. (All stage numbers of legitimate coroutines are less than maxstage .) 
The queued items are h-next, h-next^next, etc., from back to front, and we have 
onstage < c->next-> stage unless c = h. 

Initially all queues are empty. 

( Initialize everything 22 ) += 

{ register coroutine *p; 

for (p = ring-, p < ring + ringsize-, p++) p^next = p; 

} 

27. To schedule a coroutine c with positive delay d < ringsize, we call the function 
schedule {c, d, s). (The s parameter is used only if scheduling is being logged; it does 
not affect the computation, but we will generally set s to the state at which the 
scheduled coroutine will begin.) 

( Internal prototypes 13 ) += 

static void schedule ARCS ((coroutine *,int,int)); 

28. ( Subroutines 14 ) += 
static void schedule (c, d, s) 

coroutine *c; 
int d, s; 

{ 

register int tt = {cur.time + d) % ringsize; 

register coroutine *p = &iring[tt]; /* start at the list head */ 
if (d < 0 V d > ringsize) /* do a sanity check */ 

panic{confusion{"Schedul±ngu")', errprint.coroutine-id{c)-, 

errprintl ("uwithudelayu°/.d" , d)); 
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while (jr^next-* stage < cstage) p = p-’next; 
c-next = p-next ; 
p^next = c; 

if {verbose & schedule-bit) { 

pnni/("uschedulingu" ); print-Coroutine-id{c)\ 

pnni/("uatutimeu°/od,uStateu°/.d\n" , ticks . I + d, s); 



29. (External variables 4) += 

Extern int ringsize-, /* set by MMIX-Config , must be sufficiently large */ 

Extern corontine *ring; 

Extern int cur-time \ 

30. The all-important ctl field of a coroutine, which contains the data being manip- 
ulated, will be explained below. One of its key components is the state field, which 
helps to specify the next actions the coroutine will perform. When we schedule a 
coroutine for a new task, we often want it to begin in state 0. 

( Internal prototypes 13 } -|-= 

static void startup ARCS ((coroutine *,int)); 

31. (Subroutines 14) -|-= 
static void startup (c, d) 

coroutine *c; 
int d; 



} 



} 



{ 



cr’ctl^state = 0; 
schedule {c, d, 0); 



} 



confusion = macro (), §13. 
coroutine = struct, §23. 
ctl: control §23. 



ARCS = macro, §6. 



errprintO = macro (), §13. 
errprintl = macro (), §13. 
errprint2 = macro (), §13. 



Extern = macro, §4. 



1 : tetra, §17. 



MMIX-CONFIG §38. 
name: char *, §23. 
next: coroutine §23. 



max.stage = 99, §129. 
MMIX.config: void (), 



panic=macro (), §13. 
print/: int (), <stdio.h>. 
schedule.bit = 1 3, §8. 

stage: int, §23. 
state: int, §44. 
ticks: Extern octa, §87. 
verbose: int, §4. 
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32 . The following routine removes a coroutine from whatever queue it’s in. The 
case c-next = c is also permitted; such a self-loop can occur when a coroutine goes to 
sleep and expects to be awakened (that is, scheduled) by another coroutine. Sleeping- 
coroutines have important data in their ctl field; they are therefore quite different from 
unscheduled or “unemployed” coroutines, which have c^next = A. An unemployed 
coroutine is not assumed to have any valid data in its ctl field. 

( Internal prototypes 13 ) -|-= 

static void unschedule ARCS ((coroutine *)); 

33 . ( Subroutines 14 ) -|-= 
static void unschedule (c) 

coroutine *c; 

{ register coroutine *p; 

if [cr*next) { 

for (p = c; pr^next ^ c\ p — pr>next) ; 
pr>next = c-^next', 
c-next = A; 

if {verbose & schedule-bit) { 

print/ ("uunschedulingu" ); print-Coroutine-id{c); pnnt/("\n" ); 

} 

} 

} 

34 . When it is time to process all coroutines that have queued up for a particular 
time t, we empty the queue called ring [t] and link its items in the opposite order (from 
front to back). The following subroutine uses the well known algorithm discussed in 
exercise 2. 2. 3-7 of The Art of Computer Programming. 

{ Internal prototypes 13 ) -|-= 

static coroutine *queuelist ARGS((int)); 

35 . (Subroutines 14 ) -|-= 
static coroutine *queuelist{t) 

int t; 

{ register coroutine *p, *q = Szsentinel , *r; 

for (p = ring[t].next', p A &^ring[t\■, p = r) { 
r — p-next; 
p-next = g; 
q = p\ 

} 

nng\t].next = &nnp[t]; 
sentinel .next = q\ 

return g; 

} 

36 . (Global variables 20} -l-= 

coroutine sentinel; j* dummy coroutine at origin of circular list */ 
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37. Coroutines often start working on tasks that are speculative^ in the sense that 
we want certain results to be ready if they prove to be useful; we understand that 
speculative computations might not actually be needed. Therefore a coroutine might 
need to be aborted before it has finished its work. 

All coroutines must be written in such a way that important data structures remain 
intact even when the coroutine is abruptly terminated. In particular, we need to 
be sure that “locks” on shared resources are restored to an unlocked state when a 
coroutine holding the lock is aborted. 

A lockvar variable is A when it is unlocked; otherwise it points to the coroutine 
responsible for unlocking it. 

^define set Jock {c, 1) 

{ I = c, {c)-’lockloc = } 

T^tdefine release Jock {c, 1) 

{ / = A; {c)-lockloc = A; } 

( Type definitions ll ) += 

typedef coroutine *lockvar; 

38. ( External prototypes 9 ) += 

Extern void printjocks ARGS((void)); 

39. ( External routines lo) += 
void printjocks ( ) 



print.cache Jocks [IT cache)', 
prinCcacheJocks(DTcache)', 
print.cache Jocks {Icache ) ; 
pnnCcache Jocks {Dcache ) ; 
pnnCcache Jocks {Scache ) ; 

if {memJock) print/ ("memulockedubyu°/.s : ‘/,d\n" , mem Jock-name , memJock-stage)', 
if {dispatch Jock) 

print/ ("dispatchulockedubyu°/.s : ’/,d\n" , dispatch Jock-name , dispatch Jock-stage); 
if {wbufJock) print/("headuofuwriteubuf f erulockedubyu’/.s : ’/.d\n" , wbuf Jock-name , 
wbuf Jock-stage); 
if {clean Jock) 

print/ ("cleanerulockedubyu7.s : °/,d\n" , clean Jock-name , clean Jock- stage); 
if {speedJock) print/ ( "writ eubufferuflushulockedubyu’/.s : °/,d\n" , speed Jock-name , 
speed Jock-stage ) ; 



{ 



} 



ARCS = macro, §6. 
cleanAock: lockvar, §230. 
coroutine = struct, §23. 
ctl: control *, §23. 

Dcache: cache *, §168. 
dispatch^lock: lockvar, §65. 
DTcache: cache *, §168. 
Extern = macro, §4. 

Icache: cache *, §168. 



print^coroutine^id: static 
void { ), §25. 



ITcache: cache *, §168. 
lockloc: coroutine **, §23. 
memdock: lockvar, §214. 
name: char *, §23. 
next: coroutine *, §23. 
print.cache Jocks : static void 



0, §174. 



print/: int (), <stdio.h>. 
ring: coroutine *, §29. 
Scache: cache *, §168. 
schedule.bit = 1 3, §8. 

speedJock: lockvar, §247. 
stage: int, §23. 
verbose: int, §4. 
wbufJock: lockvar, §247. 
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40 . Many of the quantities we deal with are speculative values that might not yet 
have been certified as part of the “real” calculation; in fact, they might not yet have 
been calculated. 

A spec consists of a 64-bit quantity o and a pointer p to a specnode. The value o 
is meaningful only if the pointer p is A; otherwise p points to a source of further 
information. 

A specnode is a 64-bit quantity o together with links to other specnodes that are 
above it or below it in a doubly linked list. An additional known bit tells whether the 
o field has been calculated. There also is a 64-bit addr field, to identify the list and 
give further information. A specnode list keeps track of speculative values related 
to a specific register or to all of main memory; we will discuss such lists in detail later. 
(Type definitions ll) -|-= 
typedef struct { 
octa o; 

struct specnode_struct *p; 

} spec; 

typedef struct specnode.struct { 
octa o; 

bool known-, 
octa addr-, 

struct specnode.struct *up, *down; 

} specnode; 

41 . (Global variables 20 ) -|-= 

spec zero.spec-, /* zerospec.o.h = zerospec .o.l — 0 and zerospec.p = A */ 

42 . ( Internal prototypes 13) -|-= 
static void prinAspec ARCS ((spec)); 

43 . ( Snbrontines 14 ) -|-= 
static void print.spec{s) 

spec s; 

{ 

if i^s.p) print.octa{s.o); 
else { 

printf print.specnodeAd{s.p-‘addr); 

} 

} 

static void prinAspecnode (s) 

specnode s; 

{ 

if {s. known) { print.octa(s.o); printf {" \ ")-, } 

else if {s.o.hV s.o.l) { prinAocta(s.o); printf {"?"); } 

else printf 

print.specnoderid (s.addr); 

} 
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ARGS=macro, §6. 1: tetra, §17. prinUspecnode.id: static void 

bool = enum, §11. octa = struct, §17. (), §91. 

h: tetra, §17. prinUocta: static void (), §19. printf: int (), <stdio.h>. 
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44. The analog of an automobile in our simulator is a block of data called control, 
which represents all the relevant facts about an MMIX instruction. We can think of it 
as the work order attached to a car’s windshield. Each group of employees updates 
the work order as the car moves through the shop. 

A control record contains the original location of an instruction, and its four bytes 
OP X Y Z. An instruction has up to four inputs, which are spec records called y, z, 
b and re; it also has up to three outputs, which are specnode records called x, a, 
and rl . (We usually don’t mention the special input ra or the special output rl , which 
refer to MMIX’s internal registers rA and rL.) For example, the main inputs to a DIVU 
command are $Y, $Z, and rD; the outputs are the quotient $X and the remainder rR. 
The inputs to a STD command are $Y, $Z, and $X; there is one “output,” and the 
field x.addr will be set to the physical address of the memory location corresponding 
to virtual address $Y + $Z. 

Each control block also points to the coroutine that owns it, if any. And it has 
various other fields that contain other tidbits of information; for example, we have 
already mentioned the state field, which often governs a coroutine’s actions. The 
i field, which contains an internal operation code number, is generally used together 
with state to switch between alternative computational steps. If, for example, the 
op field is SUB or SUBI or NEC or NEGI, the internal opcode i will be simply sub. We 
shall define all the fields of control records now and discuss them later. 

An actual hardware implementation of MMIX wouldn’t need all the information 
we are putting into a control block. Some of that information would typically be 
latched between stages of a pipeline; other portions would probably appear in so-called 
“rename registers.” We simulate rename registers only indirectly, by counting how 
many registers of that kind would be in use if we were mimicking low-level hardware 
details more precisely. The go field is a specnode for convenience in programming, 
although we use only its known and o subfields. It generally contains the address of 
the subsequent instruction. 



(Type definitions ll) -|-= 

( Declare mmix_opcode and internaLopcode 47 ) 

typedef struct control.struct { 

octa loc; /* virtual address where an instruction originated */ 
mmix_opcode op; unsigned char xx, yy, zz\ 

/* the original instruction bytes */ 
spec y, z, b, ra\ /* inputs */ 
specnode x, a, go, rl\ /* outputs */ 
coroutine * owner-, /* a coroutine whose ctl this is */ 
internaLopcode i; /* internal opcode */ 
int state; /* internal mindset */ 
bool usage; /* should rlJ be increased? */ 
bool need-b; /* should we stall until 6.p = A? */ 
bool need.ra; /* should we stall until ra.p = A? */ 
bool renjc; /* does x correspond to a rename register? */ 
bool mem^x; /* does x correspond to a memory write? */ 
bool ren.a; /* does a correspond to a rename register? */ 
bool setj; /* does rl correspond to a new value of rL? */ 
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bool interim] /* does this instruction need to be reissued on interrupt? */ 
hoo\ stack.alert] /* is there potential for stack overflow? */ 
unsigned int arith.exc] /* arithmetic exceptions for event bits of rA */ 
unsigned int hist] /* history bits for use in branch prediction */ 
int denin , denout ; / * execution time penalties for subnormal handling * / 

octa cur.O, cur.S] /* speculative rO and rS before this instruction */ 
unsigned int interrupt] /* does this instruction generate an interrupt? */ 
void ^ptr.a, *ptr.b, ^ptr.c] /* generic pointers for miscellaneous use */ 

} control; 



addr: octa, §40. 
bool = enum, §11. 
coroutine = struct, §23. 
ctl: control *, §23. 
internal_opcode = enum. 



§49. 

known: bool, §40. 
mmix_opcode = enum, §47. 
o: octa, §40. 
octa = struct, §17. 



p: specnode *, §40. 
spec = struct, §40. 
specnode = struct, §40. 

sub = 31, §49. 
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45. ( Internal prototypes 13 ) += 

static void prinRcontroLblock ARGS((control *)); 

46. ( Subroutines 14 ) += 
static void print.controLblock {c) 

control *c; 

{ 

octa default.go\ 

if (c-*loc.h V c-'loc.l V c-op V cr>xx V c-'yy V c-zz V c^owner) { 
print.octa (c^loc ) ; 

printf{" :u"/i 02 x"/, 02 x°/, 02 x°/, 02 x(’/.s) " , c-*op, c^xx , cr>yy, c*zz, intemaLop.name [e-^]); 

} 

if (c-usage) printf {"*"); 
if {c-interim) printf {" + "); 

if {cr*y.o.h\/ c*y.o.l\/ c*y.p) { printf prinUspec{cr*y)-, } 

if (c-*z.o.h\/ c-z.o.lV c-z.p) { printf {"uz.= ")\ prinUspec{cr‘z)\ } 
if {czTi.o.hV cr^b.o.lV c-b.pV c-*need_b) { 
print/ ("ub=" ); print.spec{(T*b); 
if (c-^need.b) printf 

} 

if {cr*need-ra) { printf {"urk='')\ prinRspec{cr*ra); } 
if [cr^ren^x \/ cr>memjc) { printf {"u^=")', print.specnode{c-‘x)\ } 
else if {cr-x.o.h V c-*x.o.l) { 

printf {"u^="); print.octa{c-x.o)-, printf {"7>c" ,c-x. known ? 

} 

if (c-ren.a) { print/ ( "ua=" ); print.specnode{c-a)', } 
if (cr*setj) { print/ ("urL=" ); prinRspecnode{(r‘rl)\ } 
if [cr*interrupt) { print/ ("uint=" ); print J>its[cr^interrupt)\ } 
if {cr*arith.exc) { print/ ("uexc="); print.bits{c~>arith.exc <C 8 ); } 
defaulUgo = incr[cr>loc,4)\ 

if [cr>go.o.l 7 ^ defaulRgo.l V c-go.o.h 7 ^ default.go .h) { 
printf ( " " ) i print.octa {c-go . 0 ) ; 

} 

if {verbose show.pred.bit) printf {" uh-xst=‘/,x" , c~*hist); 
if (c-n = pop) { 
print/ ("urS=" ); 
print.octa {c-cur.S ) ; 
print/ ("urO="); 
print.octa {c~‘cur.O ) ; 

} 

print/ ("ustate=°/,d" , cristate); 

} 
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a: specnode, §44. 

ARCS = macro, §6. 
arith.exc: unsigned int, §44. 
b: spec, §44. 
control = struct, §44. 
cur^O: octa, §44. 
cur^S: octa, §44. 

30=72, §49. 
h: tetra, §17. 
hist: unsigned int, §44. 
i: inter nal.opcode, §44. 
incr: octa (), mmix-ARITH §6. 
interim: bool, §44. 
int ernaLop. name: char *[], 
§50. 

interrupt: unsigned int, §44. 
known: bool, §40. 



1: tetra, §17. 
loc: octa, §44. 
mem.x: bool, §44. 
need.b: bool, §44. 
need^ra: bool, §44. 
o: octa, §40. 
octa = struct, §17. 
op: mmix.opcode, §44. 
owner: coroutine *, §44. 
p: specnode *, §40. 
pop = 75, §49. 

prinUbits: static void (), §56. 
prinUocta: static void (), §19. 
print.spec: static void (), §43. 
prinUspecnode : static void 

0 , §43. 



printf: int (), <stdio.h>. 
ra: spec, §44. 
rema: bool, §44. 
ren^x: bool, §44. 
rl: specnode, §44. 
setJ: bool, §44. 
show.pred^bit = 1 7, §8. 

state: int, §44. 
usage: bool, §44. 
verbose: int, §4. 
x: specnode, §44. 

XX : unsigned char, §44. 

y: spec, §44. 

yy: unsigned char, §44. 

.z: spec, §44. 

zz: unsigned char, §44. 
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47. Lists. Here is a (boring) list of all the MMIX opcodes, in order. 

( Declare mmix.opcode and internaLopcode 47 ) = 
typedef enum { 

TRAP , FCMP , FUN , FEQL , FADD , FIX , FSUB , FIXU , 

FLOT , FLOTI , FLOTU , FLOTUI , SFLOT , SFLOTI , SFLOTU , SFLOTUI , 

FMUL, FCMPE, FUNE, FEQLE, FDIV, FSqRT, FREM, FINT, 

MUL, MULI , MULU, MULUI , DIV, DIVI , DIVU, DIVUI , 

ADD , ADDI , ADDU, ADDUI , SUB , SUBI , SUBU, SUBUI , 

IIADDU, IIADDUI , IVADDU, IVADDUI , VI I I ADDU, VI I I ADDUI , XVIADDU, XVI ADDUI , 
CMP , CMPI , CMPU, CMPUI , NEC , NEGI , NEGU, NEGUI , 
SL,SLI,SLU,SLUI,SR,SRI,SRU,SRUI, 

BN, BNB, BZ, BZB, BP, BPB, BOD, BODB, 

BNN , BNNB , BNZ , BNZB , BNP , BNPB , BEV , BEVB , 

PBN , PBNB , PBZ , PBZB , PBP , PBPB , PBOD , PBODB , 

PBNN , PBNNB , PBNZ , PBNZB , PBNP , PBNPB , PBEV , PBEVB , 

CSN, CSNI , CSZ, CSZI , CSP , CSPI , CSOD, CSODI , 

CSNN , CSNNI , CSNZ , CSNZI , CSNP , CSNPI , CSEV, CSEVI , 

ZSN, ZSNI , ZSZ, ZSZI , ZSP , ZSPI , ZSOD, ZSODI , 

ZSNN , ZSNNI , ZSNZ , ZSNZI , ZSNP , ZSNPI , ZSEV , ZSEVI , 

LDB , LDBI , LDBU , LDBUI , LDW , LDWI , LDWU , LDWUI , 

LDT, LDTI , LDTU, LDTUI , LDO , LDOI , LDOU, LDOUI , 

LDSF , LDSFI , LDHT , LDHTI , CSWAP , CSWAPI , LDUNC , LDUNCI , 

LDVTS , LDVTS I , PRELD , PRELD I , PREGO , PREGO I , GO , GO I , 

STB , STBI , STBU, STBUI , STW , STWI , STWU, STWUI , 

STT , STTI , STTU , STTUI , STO , STO I , STOU , STOUI , 

STSF, STSFI, STHT, STHTI, STCD, STCOI, STUNC, STUNCI, 

SYNCD, SYNCDI , PREST, PRESTI , SYNCID , SYNCIDI , PUSHGO , PUSHGOI , 

OR , ORI , ORN , ORNI , NOR , NORI , XOR , XORI , 

AND , ANDI , ANDN , ANDNI , NAND , NANDI , NXOR, NXORI , 

BDIF, BDIFI, WDIF, WDIFI , TDIF, TDIFI, ODIF, ODIFI , 

MUX, MUXI , SADD , SADDI , MOR, MORI , MXOR, MXORI , 

SETH, SETMH, SETML, SETL, INCH, INCMH, INCML, INCL, 

ORH , ORMH , ORML , ORL , ANDNH , ANDNMH , ANDNML , ANDNL , 

IMP , JMPB , PUSH! , PUSH JB , GETA , GETAB , PUT , PUTI , 

POP , RESUME , SAVE , UNSAVE , SYNC , SWYM , GET , TRIP 
} mmix_opcode; 

See also section 49. 

This code is used in section 44. 
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48. ( Global variables 20 ) += 

char *opcode.name[] = {"TRAP" , "FCMP" , "FUN" , "FEQL" , "FADD", "FIX", "FSUB" , "FIXU" , 
"FLOT" , "FLOTI" , "FLOTU" , "FLOTUI" , "SFLOT" , "SFLOTI" , "SFLOTU" , "SFLOTUI" , 

"FMUL" , "FCMPE" , "FUNE" , "FEqLE" , "FDIV" , "FSqRT" , "FREM" , "FINT" , 

"MUL", "MULI", "MULU", "MULUI", "DIV" , "DIVI", "DIVU" , "DIVUI" , 

"ADD", "ADDI", "ADDU", "ADDUI", "SUB", "SUBI", "SUBU" , "SUBUI" , 

"2ADDU", "2ADDUI", "4ADDU", "4ADDUI", "8ADDU", "8ADDUI", "16ADDU", "16ADDUI", 

"CMP", "CMPI", "CMPU", "CMPUI", "NEG", "NEGI", "NEGU" , "NEGUI", 

"SL", "SLI", "SLU", "SLUI", "SR", "SRI", "SRU", "SRUI", 

"BN" , "BNB" , "BZ" , "BZB" , "BP" , "BPB" , "BOD" , "BODB" , 

"BNN" , "BNNB" , "BNZ" , "BNZB" , "BNP" , "BNPB" , "BEV" , "BEVB" , 

"PBN" , "PBNB" , "PBZ" , "PBZB" , "PBP" , "PBPB" , "PBOD" , "PBODB" , 

"PBNN" , "PBNNB" , "PBNZ" , "PBNZB" , "PBNP" , "PBNPB" , "PBEV" , "PBEVB" , 

"CSN", "CSNI", "CSZ", "CSZI", "CSP", "CSPI", "CSOD" , "CSODI" , 

"CSNN" , "CSNNI" , "CSNZ" , "CSNZI" , "CSNP" , "CSNPI" , "CSEV" , "CSEVI" , 

"ZSN" , "ZSNI" , "ZSZ" , "ZSZI" , "ZSP" , "ZSPI" , "ZSOD" , "ZSODI" , 

"ZSNN", "ZSNNI", "ZSNZ", "ZSNZI", "ZSNP", "ZSNPI" , "ZSEV" , "ZSEVI", 

"LDB" , "LDBI" , "LDBU" , "LDBUI" , "LDW" , "LDWI" , "LDWU" , "LDWUI" , 

"LDT" , "LDTI" , "LDTU" , "LDTUI" , "EDO" , "LDOI" , "LDOU" , "LDOUI" , 

"LDSF", "LDSFI", "LDHT", "LDHTI" , "CSWAP" , "CSWAPI", "LDUNC", "LDUNCI", 

"LDVTS", "LDVTSI", "PRELD", "PRELDI", "PREGO", "PREGDI", "GO", "GOI", 

"STB" , "STBI" , "STBU" , "STBUI" , "STW" , "STWI" , "STWU" , "STWUI" , 

"STT" , "STTI" , "STTU" , "STTUI" , "STD" , "STQI" , "STOU" , "STOUI" , 

"STSF", "STSFI", "STHT", "STHTI", "STCO", "STCOI", "STUNG", "STUNCI", 

"SYNCD", "SYNCDI", "PREST", "PRESTI", "SYNCID", "SYNCIDI", "PUSHGO", "PUSHGOI", 
"OR", "ORI", "ORN", "ORNI", "NOR", "NDRI", "XOR", "XDRI", 

"AND" , "ANDI" , "ANDN" , "ANDNI" , "NAND" , "NANDI" , "NXDR" , "NXORI" , 

"BDIF", "BDIFI", "WDIF", "WDIFI", "TDIF", "TDIFI", "ODIF", "ODIFI", 

"MUX", "MUXI", "SADD", "SADDI", "MOR" , "MORI", "MXOR" , "MXORI", 

"SETH", "SETMH", "SETML", "SETL", "INCH", "INCMH", "INCML", "INCL", 

"ORH" , "ORMH" , "ORML" , "ORL" , "ANDNH" , "ANDNMH" , "ANDNML" , "ANDNL" , 

" JMP" , " JMPB" , "PUSHJ" , "PUSHJB" , "GETA" , "GETAB" , "PUT" , "PUTI" , 

"POP" , "RESUME" , "SAVE" , "UNSAVE" , "SYNC" , "SWYM" , "GET" , "TRIP" }; 
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49. And here is a (likewise boring) list of all the internal opcodes. The smallest 
numbers, less than or equal to max^pipe-op, correspond to operations for which 
arbitrary pipeline delays can be configured with MMIX^config . The largest numbers, 
greater than max-reaLcommand , correspond to internally generated operations that 
have no official OP code; for example, there are internal operations to shift the 7 
pointer in the register stack, and to compute page table entries. 

( Declare mmix.opcode and internaLopcode 47 ) += 

^define max.pipe^op feps 
^define max.reaLcommand trip 

typedef enum { 

mulO , /* multiplication by zero */ 

mull , mul2 , mul3 , mul4 , mul5 , mul6 , mull , mul8 , 

/* multiplication by 1-8, 9-16, ..., 57-64 bits */ 
div, /* DIV[U] [I] */ 

sh, j* S[L,R] [U] [I] */ 

mux, /* MUX [I] */ 

sadd, /* SADDfl] */ 

mor, /* M[X]0R[I] */ 

fadd, /* FADD, FSUB */ 

fmul, /* FMUL */ 

fdiv, /* FDIV */ 

fsqrt, /* FSQRT */ 

fint, /* FINT */ 

fix, /* FIX[U] */ 

flat, /* [S]FL0T[U] [I] */ 

feps, /* FCMPE, FUNE, FEQLE */ 

fcmp, !* FCMP */ 

funeq, /* FUN, FEQL */ 
fsub, /* FSUB */ 
frem, /* FREM */ 

mul, /* MUL[I] */ 

mulu, /* MULUfI] */ 

divu, /* DIVUfI] */ 

add, !*■ ADD [I] */ 

addu, /* [2,4,8,16,]ADDU[I] , INC[M] [H,L] */ 

sub, /* SUB [I] , NEG[I] */ 

subu, /* SUBUfI] , NEGUfI] */ 

set, /* SET[M] [H,L] , GETA[B] */ 

or, /* 0R[I] , 0R[M] [H,L] */ 

orn, /* 0RN[I] */ 

nor, /* NOR [I] */ 

and, /* AND [I] */ 

andn, /* ANDNfl] , ANDNfM] [H,L] */ 

nand, /* NANDfl] */ 

xor, /* X0R[I] */ 

nxor, /* NXORfI] */ 

shlu, /* SLU[I] */ 

shru, /* SRU[I] */ 
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shl, 


/* SL[I] */ 


shr, 


/* SR[I] */ 


cmp, 


/* CMP [I] */ 


cmpu^ 


/* CMPU[I] */ 


bdif, 


/* BDIF [I] */ 


wdif , 


/* WDIF [I] */ 


tdif, 


/* TDIF [I] */ 


odif, 


/* ODIF [I] */ 


zset, 


/* ZS[N] [N,Z,P] [I] , ZSEV[I] , ZS0D[I] 


cset, 


/* CS[N] [N,Z,P] [I] , CSEV[I] , CS0D[I] 


get, 


/ * GET * / 


put, 


/* PUT [I] */ 


Id, 


/* LD[B,W,T,0] [U] [I] , LDHT[I] , LDSF[I] 


Idptp , 


/* load page table pointer */ 


Idpte , 


/* load page table entry */ 


Idunc , 


/* LDUNC[I] */ 


Idvts , 


/* LDVTS[I] */ 


preld , 


/* PRELD [I] */ 


prest , 


/* PREST [I] */ 


st, 


/* ST0[U] [I] , STC0[I] , STUNC [I] */ 


synod , 


/* SYNCD[I] */ 


syncid. 


/* SYNCID [I] */ 


pst. 


/* ST[B,W,T] [U] [I] , STHT[I] */ 


stunc , 


j* STUNC [I] , in write buffer */ 


cswap , 


/* CSWAP [I] */ 


br. 


/* B[N] [N,Z,P] [B] */ 


pbr. 


/* PB[N] [N,Z,P] [B] */ 


push] , 


/* PUSHJ[B] */ 


go, 


0 

0 
1 — 1 
1— 1 

1 1 


prego. 


/* PREGO [I] */ 


pushgo , 


/* PUSHGO [I] */ 


pop, 


/* POP */ 


resume. 


, /* RESUME */ 


save, 


/* SAVE */ 


unsave , 


/* UNSAVE */ 


sync, 


/* SYNC */ 


jmp. 


/* JMP[B] */ 


noop, 


/* SWYM */ 


trap, 


/* TRAP */ 


trip, 


/* TRIP */ 


incgamma , /*■ increase 7 pointer */ 

decgamma , /* decrease 7 pointer */ 


incrl. 


/* increase rL and /3 */ 


sav, 


/* intermediate stage of SAVE */ 


unsav , 


/* intermediate stage of UNSAVE */ 


resum 


/* intermediate stage of RESUME * / 



} internaLopcode; 



MMIX.config : void ( ) , 



MMIX-CONFIG §38. 
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50. (Global variables 20 ) += 

char *internaLop-name[] = {"mulO", "mull", "mul2", "mul3", "mul4", "mul5", "mul6", 
"mul7" , "mul8" , "div" , "sh" , "mux" , "sadd" , "mor" , "f add" , "fmul" , "fdiv" , 

"fsqrt" , "f int" , "fix" , "flot" , "f eps" , "f cmp" , "funeq" , "f sub" , "frem" , "mul" , 
"mulu" , "divu" , "add" , "addu" , "sub" , "subu" , "set" , "or" , "orn" , "nor" , "and" , 
"andn" , "nand" , "xor" , "nxor" , "shlu" , "shru" , "shl" , "shr" , "cmp" , "cmpu" , 
"bdif", "wdif", "tdif", "odif", "zset", "cset", "get", "put", "Id", "Idptp", 
"Idpte" , "Idunc" , "Idvts" , "preld" , "prest" , "st" , "syncd" , "syncid" , "pst" , 
"stunc" , "cswap" , "br" , "pbr" , "pushj " , "go" , "prego" , "pushgo" , "pop" , "resume" , 
"save", "unsave", "sync", "jmp", "noop", "trap", "trip", "incgamma", "decgamma", 
"incrl", "sav", "unsav" , "resum"}; 

51. We need a table to convert the external opcodes to internal ones. 

( Global variables 20 ) += 

internal.opcode intemaLop [256] — { 

trap , fcmp , funeq , funeq , fadd , fix , fsub , fix , 

flot , flot , flot , flot , flot , flot , flot , flot , 

fmul , feps , feps , feps , fdiv , fsqrt , frem , fint , 

mul , mul , mulu , mulu , div , div , divu , divu , 

add, add, addu, addu, sub, sub, subu, subu, 

addu, addu, addu, addu, addu, addu, addu, addu, 

cmp, cmp, cmpu, cmpu, sub, sub, subu, subu, 

shl , shl , shlu , shlu , shr , shr , shru , shru , 

br, br, br, br, br, br, br, br, 

br, br, br, br, br, br, br, br, 

pbr , pbr , pbr , pbr , pbr , pbr , pbr , pbr , 

pbr , pbr , pbr , pbr , pbr , pbr , pbr , pbr , 

cset, cset, cset, cset, cset, cset, cset, cset, 

cset, cset, cset, cset, cset, cset, cset, cset, 

zset , zset , zset , zset, zset , zset , zset, zset , 

zset , zset , zset , zset, zset , zset ,zset, zset , 

Id, Id, Id, Id, Id, Id, Id , Id, 

Id, Id, Id, Id, Id, Id, Id , Id , 

Id, Id, Id, Id, cswap, cswap, Idunc, Idunc, 

Idvts , Idvts , preld , preld , prego , prego , go, go, 
pst , pst , pst , pst , pst , pst , pst , pst , 
pst , pst , pst , pst , st, st, st, st, 
pst , pst , pst , pst , st, st, st, st, 

syncd , syncd , prest , prest , syncid , syncid , pushgo , pushgo , 

or, or, orn, orn, nor, nor, xor, xor, 

and, and, andn, andn, nand, nand, nxor, nxor, 

bdif , bdif , wdif , wdif , tdif , tdif , odif , odif , 

mux, mux, sadd, sadd, mor, mor, mor, mor, 

set, set, set, set, addu, addu, addu, addu, 

or,or,or,or, andn, andn, andn, andn, 

jmp , jmp , pushj , pushj , set, set , put , put , 

pop , resume , save , unsave , sync , noop , get , trip } ; 



175 



MMIX-PIPE: LISTS 



add = 29, §49. 
addu = 30, §49. 
and = 37, §49. 
andn = 38, §49. 
bdif = 48, §49. 
br = 69, §49. 
cmp = 46, §49. 
cmpu = 47, §49. 
cset = 53, §49. 
cswap = 68, §49. 
div = 9, §49. 
divu = 28, §49. 
fadd = 14, §49. 
fcmp = 22, §49. 
fdiv = 16, §49. 
feps = 21, §49. 
fint = 18, §49. 
fix = 19, §49. 
flat = 20, §49. 
fmul = 15, §49. 
frem = 25, §49. 
fsqrt = 17, §49. 
fsub=24, §49. 
funeq = 23, §49. 
get = 54, §49. 



go = 72, §49. 

inter nal.opcode =enum, 

§49. 

jmp = 80, §49. 

Id = 56, §49. 

Idunc = 59, §49. 

Idvts = 60, §49. 
mor = 13, §49. 
mul = 26, §49. 
mulu = 27, §49. 
mux = 11, §49. 
nand = 39, §49. 
noop = 81, §49. 
nor = 36, §49. 
nxor = 41, §49. 
odif = 51, §49. 
or = 34, §49. 
orn = 35, §49. 
pbr = 70, §49. 
pop = 75, §49. 
prego = 73, §49. 
preld = 61, §49. 
prest = 62, §49. 
pst = 66, §49. 



pushgo = 74, §49. 
pushj = 71, §49. 
put = 55, §49. 
resume = 76, §49. 
sadd = 12, §49. 
save = 77, §49. 
set = 33, §49. 

5 / 1 / = 44, §49. 
shlu =42, §49. 
shr = 45, §49. 
shru = 43, §49. 
st = 63, §49. 
sub = 31, §49. 
subu = 32, §49. 
sync = 79, §49. 
synod = 64, §49. 
syncid = 65, §49. 
tdif = 50, §49. 
trap = 82, §49. 
trip = 83, §49. 
unsave = 78, §49. 
wdif = 49, §49. 
xor = 40, §49. 
zset = 32, §49. 
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52. While we’re into boring lists, we might as well define all the special register 
numbers, together with an inverse table for use in diagnostic outputs. These codes 
have been designed so that special registers 0-7 are unencumbered, 9-11 can’t be PUT 
by anybody, 8 and 12-18 can’t be PUT by the user. Pipeline delays might occur when 
GET is applied to special registers 21-31 or when PUT is applied to special registers 
8 or 15-20. The SAVE and UNSAVE commands store and restore special registers 0-6 
and 23-27. 



( Header definitions 6 ) += 



^define rA 
^define rB 
^define rC 
^define rD 
^define rE 
^define rF 
^define rG 
^define rH 
^define rl 
^define rj 
^define rK 
^define rL 
^define rM 
^define rN 
^define rO 
^define rP 
^define rQ 
^define rR 
^define rS 
^define rT 
^define rlJ 
^define rV 
^define rW 
^define rX 
^define rY 
^define rZ 
^define rBB 



21 

0 

8 

1 

2 

22 

19 

3 
12 

4 

15 

20 

5 

9 

10 
23 

16 

6 
11 
13 

17 

18 

24 

25 

26 
27 

7 



#deflne rTT 14 
#deflne rWW 28 
#deflne rXX 29 
#deflne rYY 30 
T^deflne rZZ 31 



/* arithmetic status register */ 

/* bootstrap register (trip) */ 

/* continuation register */ 

/* dividend register */ 

/* epsilon register */ 

/* failure location register */ 

/* global threshold register */ 

/* himult register */ 

/* interval counter */ 

/* return-jump register */ 

/* interrupt mask register */ 

/* local threshold register */ 

/* multiplex mask register */ 

/* serial number */ 

/* register stack offset */ 

/* prediction register */ 

/* interrupt request register */ 

/* remainder register */ 

/* register stack pointer */ 

/* trap address register */ 

/* usage counter */ 

/* virtual translation register */ 

/* where-interrupted register (trip) */ 

/* execution register (trip) */ 

/* Y operand (trip) */ 

/* Z operand (trip) */ 

/* bootstrap register (trap) */ 

/* dynamic trap address register */ 

/* where-interrupted register (trap) */ 
/* execution register (trap) */ 

/* Y operand (trap) */ 

/* Z operand (trap) */ 



53. (Global variables 20 ) -|-= 

char * speciaEname[i2\ — {"rB", "rD", "rE" , "rH", "rJ", " 
"rQ" ^ iij-S" , "rl" , "rT" , "rTT" , "rK" , "rQ" , "rU" , "rV" , 
"rW" , "rX" , "rY" , "rZ" , "rWW" , "rXX" , "rYY" , "rZZ" }; 



rM" , " 
"rG", 



rR", " 
"rL", 



rBB", 

"rA", 



"rC", 

"rF", 



"rN", 

"rP", 



54. Here are the bit codes that affect trips and traps. The first eight cases also 
apply to the upper half of rQ; the next eight apply to rA. 

Tjtdefine P_BIT (1 <C 0) /* instruction in privileged location */ 
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95^:define 

T^tdefine 

^define 

T^tdefine 

^define 

95^:define 

^define 

^define 

T^tdefine 

T^tdefine 

95^:define 

^define 

95^:define 

T^tdefine 

:^define 

T^tdefine 

T^tdefine 

95^:define 

95^:define 



S_BIT 


(1 « 1) 


/* security violation */ 


B_BIT 


(1 « 2) 


/* instruction breaks the rules */ 


K_BIT 


(1 « 3) 


/* instruction for kernel only */ 


N_BIT 


(1 « 4) 


/* virtual translation bypassed */ 


PX_BIT 


(1 « 5) 


/* permission lacking to execute from page */ 


PW_BIT 


(1 « 6) 


/* permission lacking to write on page */ 


PR_BIT 


(1 « 7) 


/* permission lacking to read from page */ 


PR0T_0FFSET 5 


/* distance from PR_BIT to protection code position */ 


X_BIT 


(1 « 8) 


/* floating inexact */ 


Z_BIT 


(1 « 9) 


/* floating division by zero */ 


U_BIT 


(1 < 10) 


/* floating underflow */ 


0_BIT 


(1 « 11) 


/* floating overflow */ 


I_BIT 


(1 < 12) 


/* floating invalid operation */ 


W_BIT 


(1 « 13) 


/* float-to-fix overflow */ 


V_BIT 


(1 < 14) 


/* integer overflow */ 


D_BIT 


(1 « 15) 


/* integer divide check */ 


H_BIT 


(1 < 16) 


/* trip handler bit */ 


F_BIT 


(1 < 17) 


/■* forced trap bit */ 


E_BIT 


(1 < 18) 


/* external (dynamic) trap bit */ 



( Global variables 20 ) += 

char bit.code.map [] = "EFHDVWIOUZXrwxnkbsp" ; 



55. ( Internal prototypes 13 )+= 
static void print.bits ARGS((int)); 

56. (Subroutines 14 ) += 
static void print J)its(x) 

int x\ 

{ 

register int h, j; 

for {j = 0,b — E_BIT; (a; & (& + 6 — 1)) A b\ j++,b S>= 1) 
if {x&ib) printf ("‘/tc" , bit.code.map[j])-, 

} 

57. The lower half of rQ holds external interrupts of highest priority. Most of them 
are implementation-dependent, but a few are defined in general. 

( Header definitions 6 ) += 

^define POWER_FAILURE (1 ^ 0) /* try to shut down calmly and quickly */ 

^define PARITY_ERRQR (1 1) /* try to save the file systems */ 

^define N0NEXISTENT_MEM0RY (1 <C 2) /* a memory address can’t be used */ 

^define REB00T_SIGNAL (1 <C 4) /* it’s time to start over */ 

^define INTERVAL_TIMEOUT (1 <C 6) /* the timer register, rl, has reached zero */ 

^define STACK_0VERFL0W (1 ^ 7) /* data has been stored on the rC page */ 



ARCS = macro, §6. 



printf: int (), <stdlo.h>. 
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58. Dynamic speculation. Now that we understand some basic low-level struc- 
tures, we’re ready to look at the larger picture. 

This simulator is based on the idea of “dynamic scheduling with register renam- 
ing,” as introduced in the 1960s by R. M. Tomasulo [IBM Journal of Research and 
Development 11 (1967), 25-33]. Moreover, the dynamic scheduling method is ex- 
tended here to “speculative execution,” as implemented in several processors of the 
1990s and described in section 4.6 of Hennessy and Patterson’s Computer Architec- 
ture, second edition (1995). The essential idea is to keep track of the pipeline contents 
by recording all dependencies between unfinished computations in a queue called the 
reorder bujfer. An entry in the reorder buffer might, for example, correspond to an 
instruction that adds together two numbers whose values are still being computed; 
those numbers have been allocated space in earlier positions of the reorder buffer. The 
addition will take place as soon as both of its operands are known, but the sum won’t 
be written immediately into the destination register. It will stay in the reorder buffer 
until reaching the hot seat at the front of the queue. Finally, the addition leaves the 
hot seat and is said to be committed. 

Some instructions in the reorder buffer may in fact be executed only on speculation, 
meaning that they won’t really be called for unless a prior branch instruction has the 
predicted outcome. Indeed, we can say that all instructions not yet in the hot seat are 
being executed speculatively, because an external interrupt might occur at any time 
and change the entire course of computation. Organizing the pipeline as a reorder 
buffer allows us to look ahead and keep busy computing values that have a good 
chance of being needed later, instead of waiting for slow instructions or slow memory 
references to be completed. 

The reorder buffer is in fact a queue of control records, conceptually forming part 
of a circle of such records inside the simulator, corresponding to all instructions that 
have been dispatched or issued but not yet committed, in strict program order. 

The best way to get an understanding of speculative execution is perhaps to imagine 
that the reorder buffer is large enough to hold hundreds of instructions in various 
stages of execution, and to think of an implementation of MMIX that has dozens of 
functional units — more than would ever actually be built into a chip. Then one 
can readily visualize the kinds of control structures and checks that must be made to 
ensure correct execution. Without such a broad viewpoint, a programmer or hardware 
designer will be inclined to think only of the simple cases and to devise algorithms 
that lack the proper generality. Thus we have a somewhat paradoxical situation in 
which a difficult general problem turns out to be easier to solve than its simpler special 
cases, because it enforces clarity of thinking. 

Instructions that have completed execution and have not yet been committed are 
analogous to cars that have gone through our hypothetical repair shop and are waiting 
for their owners to pick them up. However, all analogies break down, and the world 
of automobiles does not have a natural counterpart for the notion of speculative 
execution. That notion corresponds roughly to situations in which people are led to 
believe that their cars need a new piece of equipment, but they suddenly change their 
mind once they see the price tag, and they insist on having the equipment removed 
even after it has been partially or completely installed. 
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Speculatively executed instructions might make no sense: They might divide by 
zero or refer to protected memory areas, etc. Such anomalies are not considered 
catastrophic or even exceptional until the instruction reaches the hot seat. 

The person who designs a computer with speculative execution is an optimist, who 
has faith that the vast majority of the machine’s predictions will come true. The 
person who designs a reliable implementation of such a computer is a pessimist, who 
understands that all predictions might come to naught. The pessimist does, however, 
take pains to optimize the cases that do turn out well. 
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59. Let’s consider what happens to a single instruction, say ADD as it 

travels through the pipeline in a normal situation. The first time this instruction is 
encountered, it is placed into the I-cache (that is, the instruction cache), so that we 
won’t have to access memory when we need to perform it again. We will assume for 
simplicity in this discussion that each I-cache access takes one clock cycle, although 
other possibilities are allowed by MMIX_config . 

Suppose the simulated machine fetches the example ADD instruction at time 1000. 
Fetching is done by a coroutine whose stage number is 0. A cache block typically 
contains 8 or 16 instructions. The fetch unit of our machine is able to fetch up to 
fetch-max instructions on each clock cycle and place them in the fetch buffer, provided 
that there is room in the buffer and that all the instructions belong to the same cache 
block. 

The dispatch unit of our simulator is able to issue up to dispatch-max instructions 
on each clock cycle and move them from the fetch buffer to the reorder buffer, provided 
that functional units are available for those instructions and there is room in the 
reorder buffer. A functional unit that handles ADD is usually called an ALU (arithmetic 
logic unit), and our simulated machine might have several of them. If they aren’t all 
stalled in stage 1 of their pipelines, and if the reorder buffer isn’t full, and if the 
machine isn’t in the process of deissuing instructions that were mispredicted, and if 
fewer than dispatch-max instructions are ahead of the ADD in the fetch buffer, and if 
all such prior instructions can be issued without using up all the free ALUs, our ADD 
instruction will be issued at time 1001. (In fact, all of these conditions are usually 
true.) 

We assume that L > 3, so that $1, $2, and $3 are local registers. For simplicity we’ll 
assume in fact that the register stack is empty, so that the ADD instruction is supposed 
to set 1[1] ^ 1[2] + 1[3]. The operands 1[2] and 1[3] might not be known at time 1001; 
they are spec values, which might point to specnode entries in the reorder buffer for 
previous instructions whose destinations are 1[2] and 1[3]. The dispatcher fills the next 
available control block of the reorder buffer with information for the ADD, containing 
appropriate spec values corresponding to 1[2] and 1[3] in its y and z fields. The x field 
of this control block will be inserted into a doubly linked list of specnode records, 
corresponding to 1[1] and to all instructions in the reorder buffer that have 1[1] as 
a destination. The boolean value x.known will be set to false, meaning that this 
speculative value still needs to be computed. Subsequent instructions that need 1[1] 
as a source will point to x, if they are issued before the sum x.o has been computed. 
Double linking is used in the specnode list because the ADD instruction might be 
cancelled before it is finally committed; thus deletions might occur at either end of 
the list for 1[1]. 

At time 1002, the ALU handling the ADD will stall if its inputs y and z are not 
both known (namely if y.p A or z.p ^ A). In fact, it will also stall if its third 
input rA is not known; the current speculative value of rA, except for its event bits, 
is represented in the ra field of the control block, and we must have ra.p = A. In 
such a case the ALU will look to see if the spec values pointed to by y.p and/or 
z.p and/or ra.p become defined on this clock cycle, and it will update its own input 
values accordingly. 
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But let’s assume that y, z, and ra are already known at time 1002. Then x.o will 
be set to y.o + z.o and x. known will become true. This will make the result destined 
for 1[1] available to be used in other commands at time 1003. 

If no overflow occurs when adding y.o to z.o, the interrupt and arith^exc fields of 
the control block for ADD are set to zero. But when overflow does occur (shudder), 
there are two cases, based on the V-enable bit of rA, which is found in field b.o of the 
control block. If this bit is 0, the V-bit of the arith_exc field in the control block is set 
to 1; the arith_exc field will be ored into rA when the ADD instruction is eventually 
committed. But if the V-enable bit is 1, the trip handler should be called, interrupting 
the normal sequence. In such a case, the interrupt field of the control block is set to 
specify a trip, and the fetcher and dispatcher are told to forget what they have been 
doing; all instructions following the ADD in the reorder buffer must now be deissued. 
The virtual starting address of the overflow trip handler, namely location 32, is hastily 
passed to the fetch routine, and instructions will be fetched from that location as soon 
as possible. (Of course the overflow and the trip handler are still speculative until 
the ADD instruction is committed. Other exceptional conditions might cause the ADD 
itself to be terminated before it gets to the hot seat. But the pipeline keeps charging 
ahead, always trying to guess the most probable outcome.) 

The commission unit of this simulator is able to commit and/or deissue up to 
commit^max instructions on each clock cycle. With luck, fewer than commit^max 
instructions will be ahead of our ADD instruction at time 1003, and they will all be 
completed normally. Then 1[1] can be set to x.o, and the event bits of rA can be 
updated from arith_exc, and the ADD command can pass through the hot seat and 
out of the reorder buffer. 

( External variables 4 ) -|-= 

Extern int fetch.max, dispatchjmax , peekahead, commit jmax\ 

/* limits on instructions that can be handled per clock cycle */ 



arith^exc: unsigned int, §44. 
b: spec, §44. 

Extern = macro, §4. 
false =0, §11. 

interrupt', nnsigned int, §44. 
known: bool, §40. 



MMIX^config: void (), 
MMIX-CONFIG §38. 
o: octa, §40. 
p: specnode *, §40. 
ra: spec, §44. 



stage: int, §23. 
true = 1, §11. 
x: specnode, §44. 
y: spec, §44. 
z: spec, §44. 
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60. The instruction currently occupying the hot seat is the only issued-but-not- 
yet-committed instruction that is guaranteed to be truly essential to the machine’s 
computation. All other instructions in the reorder buffer are being executed on 
speculation; if they prove to be needed, well and good, but we might want to jettison 
them all if, say, an external interrupt occurs. 

Thus all instructions that change the global state in complicated ways — like LDVTS, 
which changes the virtual address translation caches — are performed only when they 
reach the hot seat. Fortunately the vast majority of instructions are sufficiently simple 
that we can deal with them more efficiently while other computations are taking place. 

In this implementation the reorder buffer is simply housed in an array of control 
records. The first array element is reorder _bot , and the last is reorderMop. Variable 
hot points to the control block in the hot seat, and hot — 1 to its predecessor, etc. 
Variable cool points to the next control block that will be filled in the reorder buffer. 
If hot = cool the reorder buffer is empty; otherwise it contains the control records 
hot, hot — 1, . . . , cool + 1, except of course that we wrap around from reorder^bot to 
reorder^top when moving down in the buffer. 

( External variables 4 } += 

Extern control *reorder^bot , *reorderMop\ 

/* least and greatest entries in the ring containing the reorder buffer */ 

Extern control *hot, *cool; /* front and rear of the reorder buffer */ 

Extern control *oldJiot\ /* value of hot at beginning of cycle */ 

Extern int deissues; /* the number of instructions that need to be deissued */ 

61. (Initialize everything 22 ) += 

hot = cool = reorder Aop; deissues — 0; 

62. ( Internal prototypes 13 ) += 

static void print.reorder.buffer ARCS ((void)); 

63. ( Subroutines 14 ) += 
static void prinLreorderJ>uffer{) 

{ 

print/ ("Reorderubuffer" ); 

if {hot = cool) print/("u(empty) \n" ); 

else { register control *p; 

if {deissues) printf {"uC/AuI-OuLeudeissued)" , deissues); 

if {doing interrupt) print/ ("□ (interruptustateu"/«d) ", doing interrupt); 

printf {" : \n" ); 

for (p = hot; p 7 ^ cool; p = {p = reorder.bot ? reorder.top : p — 1 )) { 
prinLcontroLblock (p); 
if {pr‘ owner) { 

printf {"u"); print.coroutine.id {pr^owner); 

} 

printf {"\n" ); 

} 

} 

print/ (" u°/.duavailableur enameuregist er°/, s ,u°/>duniemoryuslot°/,s\n" , rename^regs , 
rename.regs 7 ^ 1 ? "s" : mem.slots, mem.slots 7 ^ 1 ? "s" : ""); 



} 
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64. Here is an overview of what happens on each clock cycle. 

( Perform one machine cycle 64 ) = 

{ 

( Check for external interrupt 314 ) ; 
dispatch^count = 0; 

old.hot = hot; /* remember the hot seat position at beginning of cycle */ 

oldAail = tail; /* remember the fetch buffer contents at beginning of cycle */ 

suppress. dispatch = {deissues V dispatch Jock); 

if {doing .interrupt) (Perform one cycle of the interrupt preparations 318 ) 
else (Commit and/or deissue up to eommit.max instructions 67); 

( Execute all coroutines scheduled for the current time 125 ) ; 
if {-isuppress.dispatch) ( Dispatch one cycle’s worth of instructions 74); 
ticks = incr{ticks, 1); /* and the beat moves on */ 

dispatch.stat [dispatch.count] ++ ; 

} 

This code is used in section 10. 

65. (Global variables 20 ) += 

int dispatch.count] /* how many dispatched on this cycle */ 

bool suppress.dispatch] /* should dispatching be bypassed? */ 

int doing. interrupt] /* how many cycles of interrupt preparations remain */ 

lockvar dispatch.lock ; / * lock to prevent instruction issues * / 

66. (External variables 4 ) += 

Extern int ^dispatch.stat] /* how often did we dispatch 0, 1, ... instructions? */ 
Extern bool security. disabled] /* omit security checks for testing purposes? */ 



ARCS = macro, §6. 
bool = enum, §11. 
eommit.max: int, §59. 
control = struct, §44. 

Extern = macro, §4. 

incr: octa (), mmix-ARITH §6. 



lockvar = coroutine *, §37. 

mem.slots: int, §86. 
old.tail: fetch §70. 
owner: coroutine *, §44. 
print.control.block: static 
void { ), §46. 



print.coroutine.id: static 
void ( ), §25. 

printf: int (), <stdio.h>. 
rename.regs: int, §86. 
tail: fetch *, §69. 
ticks: Extern octa, §87. 
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67 . (Commit and/or deissue up to commit.max instructions 67) = 

{ 

for (m = commitjmax\ m > 0 A deissues > 0; m — ) 

(Deissue the coolest instruction 145); 
for ( ; m > 0; m — ) { 

if (hot = cool) break; /* reorder buffer is empty */ 
if (-isecurity. disabled) (Check for security violation, break if so 149); 
if (hot-owner) break; /* hot seat instruction isn’t finished */ 

(Commit the hottest instruction, or break if it’s not ready 146); 
i = hot-i; 

if (hot = reorder^bot) hot = reorderMop; 
else hot — ; 

if (i = resum) break; /* allow the resumed instruction to see the new rK */ 

} 

} 

This code is used in section 64. 
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68. The dispatch stage. It would be nice to present the parts of this simulator 
by dealing with the fetching, dispatching, executing, and committing stages in that 
order. After all, instructions are first fetched, then dispatched, then executed, and 
finally committed. However, the fetch stage depends heavily on difficult questions 
of memory management that are best deferred until we have looked at the simpler 
parts of simulation. Therefore we will take our initial plunge into the details of 
this program by looking first at the dispatch phase, assuming that instructions have 
somehow appeared magically in the fetch buffer. 

The fetch buffer, like the circular priority queue of all coroutines and the circular 
queue used for the reorder buffer, lives in an array that is best regarded as a ring of 
elements. The elements are structures of type fetch, which have five fields: A 32-bit 
inst, which is an MMIX instruction; a 64-bit loc, which is the virtual address of that 
instruction; an interrupt field, which is nonzero if, for example, the protection bits 
in the relevant page table entry for this address do not permit execution access; a 
boolean noted field, which becomes true after the dispatch unit has peeked at the 
instruction to see whether it is a jump or probable branch; and a hist field, which 
records the recent branch history. (The least significant bits of hist correspond to the 
most recent branches.) 

( Type definitions ll ) -|-= 

typedef struct { 

octa loc, /* virtual address of instruction */ 
tetra inst-, /* the instruction itself */ 

unsigned int interrupt-, /* bit codes that might cause interruption */ 
bool noted-, /* have we peeked at this instruction? */ 
unsigned int hist-, /* if we peeked, this was the peekjiist */ 

} fetch; 

69. The oldest and youngest entries in the fetch buffer are pointed to by head and 
tail, just as the oldest and youngest entries in the reorder buffer are called hot and 
cool. The fetch coroutine will be adding entries at the tail position, which starts at 
old.tail when a cycle begins, in parallel with the actions simulated by the dispatcher. 
Therefore the dispatcher is allowed to look only at instructions in head, head — 1, 

. . . , old-tail -P 1, although a few more recently fetched instructions will usually be 
present in the fetch buffer by the time this part of the program is executed. 

( External variables 4 ) -|-= 

Extern fetch *fetch-bot, * fetch-top-, 

/* least and greatest entries in the ring containing the fetch buffer */ 

Extern fetch *head, *taiT, /* front and rear of the fetch buffer */ 



bool = enum, §11. 
commit-max: int, §59. 
cool: control *, §60. 
deissues: int, §60. 
Extern = macro, §4. 
hot: control *, §60. 
i: internal_opcode, §44. 



i: register int, §12. 
m: register int, §12. 
octa = struct, §17. 
old-tail: fetch *, §70. 
o-wner: coroutine *, §44. 
peek-hist: unsigned int, §99. 



reorder-bot: control *, §60. 
reorder-top: control *, §60. 
resum = 89, §49. 
security-disabled: bool, §66. 
tetra = unsigned int, §17. 
true = 1, §11. 
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70. (Global variables 20 ) += 

fetch *old.tail', /* rear of the fetch buffer available on the current cycle */ 

71. #define UNKNDWN_SPEC ((specnode *) 1) 

( Initialize everything 22 ) += 
head = tail = fetch.top ; 
insEptr.p = UNKN0WN_SPEC ; 

72. ( Internal prototypes 13 ) += 

static void print Jetch.bujfer ARCS ((void)); 

73. ( Subroutines 14 ) += 
static void print Jetch.bujfer ( ) 

{ 

prmf/("Fetchubuffer" ); 

if {head = tail) prmf/("u(empty) \n" ); 

else { register fetch *p; 

if {resuming) printf {"u(.resumpt±onustateu7,d)" , resuming)-, 
prmtf {" : \n" ); 

for {p = head', p ^ tail] p = {p = fetch.bot ? fetch.top : p — 1)) { 
print.octa {p^loc ) ; 

printf {" ;u’/o08x(°/,s) " ,p->inst, opcode^name[p-*inst 24]); 
if {pr-interrupt) printj}its{p^interrupt)\ 
if {p-noted) printf {"*")■, 
printf {"\n")\ 

} 

} 

print/ (" Instruct ionupointer uisu" ); 
if {inst.ptr .p = A) prinEocta{instjptr .o)-, 
else { 

print/ ("wait inguforu" ); 

if ( inst_ptr .p = UNKN0WN_SPEC) print/ (" dispat ch" ); 
else if {instjptr ,pr>addr .h = (tetra) —1) 

print_corontine_id(( (control *) instjptr .pr^up)-" owner)-, 
else print.specnode.id {inst.ptr .p-'addr); 

} 

printf {"\n" )-, 

} 

74. The best way to understand the dispatching process is once again to “think 
big,” by imagining a huge fetch buffer and the potential ability to issue dozens of 
instructions per cycle, although the actual numbers are typically quite small. 

If the fetch buffer is not empty after dispatch.max instructions have been dis- 
patched, the dispatcher also looks at up to peekahead further instructions to see if 
they are jumps or other commands that change the flow of control. Much of this action 
would happen in parallel on a real machine, but our simulator works sequentially. 

In the following program, trueJiead records the head of the fetch buffer as in- 
structions are actually dispatched, while head refers to the position currently being 
examined (possibly peeking into the future). 
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If the fetch buffer is empty at the beginning of the current clock cycle, a “dispatch 
bypass” allows the dispatcher to issue the first instruction that enters the fetch buffer 
on this cycle. Otherwise the dispatcher is restricted to previously fetched instructions. 
( Dispatch one cycle’s worth of instructions 74 ) = 

{ register fetch * true-head, *new-head-, 
true-head = head-, 

if {head = old-tail A head 7 ^ tail) old-tail = {head = fetch-bot ? fetch-top : head — 1); 
peek-hist = cool-hist-, 

for {j = 0; j < dispatch-max + peekahead-, j++) 

{ Look at the head instruction, and try to dispatch it if j < dispatch-max 75 ) ; 
head = truc-head-, 

} 

This code is used in section 64. 



addr: octa, §40. 

ARCS = macro, §6. 
control = struct, §44. 
cooLhist: unsigned int, §99. 
dispatch^max : int, §59. 
fetch = struct , § 68 . 
fetch-bot: fetch §69. 

fetch^top: fetch §69. 

h: tetra, §17. 
head: fetch *, §69. 
inst: tetra, § 68 . 
instjptr: spec, §284. 



interrupt: unsigned int, § 68 . 
j: register int, § 12 . 
loc: octa, § 68 . 
noted: bool, § 68 . 
o: octa, §40. 

opcode^name: char *[], §48. 
owner: coroutine *, §44. 
p: specnode *, §40. 
peek.hist: unsigned int, §99. 
peekahead: int, §59. 
printj)its: static void (), §56. 



print. coroutine.id: static 
void ( ), §25. 

print.octa: static void (), §19. 
print.specnode.id: static void 

0 , §91. 

printf: int (), <stdio.h>. 
resuming: int, §78. 
specnode = struct, §40. 
tail: fetch +, §69. 
tetra = unsigned int, §17. 
up: specnode *, §40. 
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75 . (Look at the head instruction, and try to dispatch it if j < dispatch.max 75) = 

{ 

register mmix_opcode op; 
register int yz, /; 
register bool freeze.dispatch = false; 
register func *u = A; 

if {head = old.tail) break; /* fetch buffer empty */ 

if {head = fetch.bot) new.head = fetch Aop; else newjiead = head — 1; 

op = head-inst ^ 24; yz = head-inst & *ffff ; 

(Determine the flags, /, and the internal opcode, i 8o); 

(Install default fields in the cool block lOo); 

if (/ & reLaddr.bit) (Convert relative address to absolute address 84); 
if {head->noted) peekMst = head^hist; 

else (Redirect the fetch if control changes at this inst 85); 
if (i ^ dispatchjmax V dispatchjock V nullifying) { 

head = new-head; continue; /* can’t dispatch, but can peek ahead */ 

} 

if {cool = reorder.bot) new.cool = reorder.top; else new.cool = cool — 1; 

(Dispatch an instruction to the cool block if possible, otherwise goto stall lOl); 
(Assign a functional unit if available, otherwise goto stall 82 ); 

(Check for sufficient rename registers and memory slots, or goto stall ill); 
if {{op & *e0) = *40) (Record the result of branch prediction 152); 

(Issue the cool instruction 81 ); 

cool = new.cool; cooLO = new^O; cooLS = newS; 

cooLhist = peekjiist; continue; 

stall: (Undo data structures set prematurely in the cool block and break 123 ); 

} 

This code is used in section 74. 

76 . An instruction can be dispatched only if a functional unit is available to handle 
it. A functional unit consists of a 256-bit vector that specifies a subset of MMIX’s 
opcodes, and an array of coroutines for the pipeline stages. There are k coroutines in 
the array, where k is the maximum number of stages needed by any of the opcodes 
supported. 

(Type definitions ll) -|-= 

typedef struct func_struct { 

char name [16]; /* symbolic designation */ 

tetra ops [8]; /* big-endian bitmap for the opcodes supported */ 

int k; /* number of pipeline stages */ 

coroutine *co; /* pointer to the first of k consecutive coroutines */ 

} func; 

77 . ( External variables 4 ) -|-= 

Extern func *funit; /* pointer to array of functional units */ 

Extern int funit.count; /* the number of functional units */ 
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78. It is convenient to have a 256-bit vector of all the supported opcodes, because 
we need to shut off a lot of special actions when an opcode is not supported. 

( Global variables 20 ) -|-= 

control *new-Cool; /* the reorder position following cool */ 

int resuming-, /* set nonzero if resuming an interrupted instruction */ 

tetra support [8]; /* big-endian bitmap for all opcodes supported */ 

79. ( Initialize everything 22 } -|-= 

{ register func *u; 

for (u = funit; u < funit + funit.count; u+-l) 
for (f = 0; i < 8; i-l-+) support[i] |= u-'ops[i]-, 

} 

80. :?^define signJ)it ((unsigned) "^80000000) 

(Determine the flags, /, and the internal opcode, z so) = 

if {-A[support[op ^ 5] & {signJ)it ^ {op &31)))) { 

/* oops, this opcode isn’t supported by any functional unit */ 

/ = flags [trap] , i = trap ; 

} else f = flags[op],i = internaLop[op]; 

if {i = trip A {head^loc.h & sign.bit)) / = 0,z = noop\ 

This code is used in section 75. 



bool = enum, §11. 
control = struct, §44. 
cool: control *, §60. 
cooLhist: unsigned int, §99. 
cooLO: octa, §98. 
cooLS: octa, §98. 
coroutine = struct, §23. 
dispatch^lock: lockvar, §65. 
dispatch^max: int, §59. 
Extern = macro, §4. 
false = 0, §11. 
fetch^bot: fetch §69. 

fetch^top: fetch §69. 

flags: unsigned char [], §83. 



h: tetra, §17. 

head: fetch *, §69. 

hist: unsigned int, §68. 

i: register int, §12. 

i: register int, §10. 

inst: tetra, §68. 

internaLop: internaLopcode 

[], §51. 

j: register int, §12. 
loc: octa, §68. 

mmix_opcode = enum, §47. 
new.head: register fetch *, 
§74. 

new.O: octa, §99. 



new.S: octa, §99. 
noop = 81, §49. 
noted: bool, §68. 
nullifying: bool, §315. 
old.tail: fetch *, §70. 
peekJiist: unsigned int, §99. 
rel-addr^hit = §83. 

reorder.bot: control *, §60. 
reorder.top: control *, §60. 
tetra = unsigned int, §17. 
trap = 82, §49. 

TRAP = #00, §47. 
trip = 83, §49. 
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81. (Issue the cool instruction 81 ) = 
if {cool-interim) { 

cool-usage = false ; 

if {cool-op = SAVE) ( Get ready for the next step of SAVE 341 ) 

else if {cool-op = UNSAVE) (Get ready for the next step of UNSAVE 335) 

else if {cool-i = preld V cool—i = prest) 

{ Get ready for the next step of PRELD or PREST 228 ) 
else if {cool-i = prego) { Get ready for the next step of PREGO 229 ) 

} 

else if {cool-i < max.reaLcommand) { 

if {{flags [cool-op] & ctLchangeMt) V cool-i = pbr) 

if {inst.ptr.p = A A {instjptr .o.h & signjtit) A -i{cool-loc .h sign.bit) A cool-i 7 ^ trap) 
cool-interrupt |= P_BIT; /* jumping from nonnegative to negative */ 
true.head = head = newJiead\ /* delete instruction from fetch buffer */ 
resuming = 0; 

} 

if {freeze.dispatch) set Jock {u-co , dispatch Jock)-, 
cool-owner = u-co-, u-co-ctl = cooT, 

startup {u-co , 1); /* schedule execution of the new inst */ 

if {verbose & issue J>it) { 

print/ (" Issuingu" ); print.controLblock{cool); 
printf {"u")-, print.eoroutineJd{u-co)-, printf {"\n")-, 

} 

dispatch.count ++ ; 

This code is used in section 75. 

82. We assign the first functional unit that supports op and is totally unoccupied, 
if possible; otherwise we assign the first functional unit that supports op and has 
stage 1 unoccupied. 

(Assign a functional unit if available, otherwise goto stall 82 ) = 

{ register int t = op ^ 5, fe = sign.bit {op 31); 

if {cool—i = trap A op ^ TRAP) { /* opcode needs to be emulated */ 

u = funit + funit.count ; /* this unit supports just TRIP and TRAP * / 

goto uniEfound-, 

} 

for {u = funit; u < funit + funiEcount; u++) 
if {u-ops [t] & fo) { 

for (i = 0; i < u-k; i++) 

if {u-co[i].next) goto uniEbusy; 
goto uniEfound; 
uniEbusy : ; 

} 

for {u = funit; u < funit + funiEcount; u++) 

if {{u-ops[t] &cb) A {u-co-next = A)) goto uniEfound; 
goto stall; /* all units for this op are busy */ 

} 

unitjound : 

This code is used in section 75. 
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co: coroutine *, §76. 
cool: control *, §60. 
ctl: control *, §23. 
ctLchange.bit = ^80, §83. 
dispatch.count: int, §65. 
dispatch.lock: lockvar, §65. 
false = 0, §11. 

flags: unsigned char [], §83. 
freeze.dispatch: register 
bool, §75. 
funit: func *, §77. 
funiCcount: int, §77. 
h: tetra, §17. 
head: fetch *, §69. 
i: internal.opcode, §44. 
i: register int, §12. 
instjptr: spec, §284. 
interim: bool, §44. 
interrupt: unsigned int, §44. 



issue.bit = 1 <C 0, §8. 
k: int, §76. 
loc: octa, §44. 

max.reaLcommand = trip, §49. 
newJiead: register fetch *, 
§74. 

next: coroutine *, §23. 
o: octa, §40. 
op: mmix.opcode, §44. 
op: register mmix.opcode, 
§75. 

ops: tetra [], §76. 
owner: coroutine *, §44. 
p: specnode *, §40. 

P_BIT = 1 < 0, §54. 
pbr = 10, §49. 
prego = 73, §49. 
preld = 61, §49. 
prest = 62, §49. 



print. controLblock: static 
void ( ), §46. 

print. coroutine.id: static 
void ( ), §25. 

printf: int (), <stdio.h>. 
resuming: int, §78. 

SAVE = ^fa, §47. 
set.lock = macro (), §37. 
sign.bit = macro, §80. 
stall: label, §75. 
startup: static void (), §31. 
trap = 82, §49. 

TRAP = ^00, §47. 
true.head: register fetch *, 
§74. 

u: register func *, §75. 
UNSAVE = ^fb, §47. 
usage: bool, §44. 
verbose: int, §4. 
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83. The flags table records special properties of each operation code in binary 
notation; *1 means Z is an immediate value, *2 means rZ is a source operand, 
means Y is an immediate value, ’^8 means rY is a source operand, *10 means rX 
is a source operand, *20 means rX is a destination, *40 means YZ is part of a relative 
address, *80 means the control changes at this point. 

T^tdefine XAs-destJ)it *20 
^define reLaddrJ)it *40 
T^define ctLchange.bit *80 



( Global variables 20 ) += 



unsigned char flags [256] = 


{*8a, *2a. 


*26, 


*25, 


*26, 


*25, 


*26, 


*25, 


*26, 


*25, 


*2a, 


*2a, 


*2a, 


*2a. 


*2a. 


*26, 


*2a, 


*26, 


*2a, 


*29, 


*2a, 


*29, 


*2a. 


*29, 


*2a, 


*29, 


*2a, 


*29, 


*2a, 


*29, 


*2a. 


*29, 


*2a, 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*26, 


*25, 


*26, 


*25, 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*50, 


*3a. 


*39, 


*3a. 


*39, 


*3a. 


*39, 


*3a. 


*39, 


*3a. 


*39, 


*3a. 


*39, 


*3a. 


*39, 


*3a. 


*39, 


*2a, 


*29, 


*2a, 


*29, 


*2a. 


*29, 


*2a, 


*29, 


*2a, 


*29, 


*2a, 


*29, 


*2a. 


*29, 


*2a, 


*29, 


*2a, 


*29, 


*2a, 


*29, 


*2a. 


*29, 


*2a, 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*3a. 


*39, 


*2a. 


*29, 


*2a. 


*29, 


*0a. 


*09, 


*0a. 


*09, 


*aa. 


*a9. 


*la, 


*19, 


*la. 


*19, 


*la. 


*19, 


*la, 


*19, 


*la, 


*19, 


*la. 


*19, 


*la. 


*19, 


*la, 


*19, 


*la, 


*19, 


*la. 


*19, 


*0a. 


*09, 


*la, 


*19, 


*0a. 


*09, 


*0a. 


*09, 


*0a. 


*09, 


*aa. 


*a9. 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*2a, 


*29, 


*2a, 


*29, 


*2a. 


*29, 


*2a, 


*29, 


*2a, 


*29, 


*2a, 


*29, 


*2a. 


*29, 


*2a. 


*29, 


*20, 


*20, 


*20, 


*20, 


*30, 


*30, 


*30, 


*30, 


*30, 


*30, 


*30, 


*30, 


*30, 


*30, 


*30, 


*30, 


*c0. 


*c0. 


*e0. 


*e0. 


*60, 


*60, 


*02, 


*01, 


*80, 


*80, 


*00, 


*02, 


*01, 


*00, 


*20, 


*8a}; 



2a, *2a, *2a, *26,*2a, *26, /* TRAP, . . . */ 

/* PLOT, ... */ 

/* FMUL, ... */ 

/ * MUL , . . . * / 

/ * ADD , . . . */ 

/* 2ADDU, ... */ 

/ * CMP , . . . */ 

/* SL, ... */ 

/* BN, . . . */ 

/* BNN, ... */ 

/* PBN, ... */ 

/* PBNN, ... */ 

/* CSN, ... */ 

/* CSNN, ... */ 

/* ZSN, ... */ 

/* ZSNN, ... */ 

/ * LDB , . . . */ 
j * LDT , . . . */ 

/ * LDSF , . . . */ 

I* LDVTS, ... */ 

/ * STB , . . . */ 

/* STT, ... */ 

/* STSF, ... */ 

/* SYNCD, ... */ 

/ * OR , . . . */ 

/* AND, ... */ 

/* BDIF, ... */ 

/* MUX, ... */ 

/* SETH, ... */ 

/* ORH, ... *! 

/ * JMP , . . . */ 

/* POP, ... */ 
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84. (Convert relative address to absolute address 84) = 

{ 

if {i = jmp) yz = head-^inst & ; 

if {op & 1) yz —= {i = jmp ? *1000000 : *10000); 
cool^.o = incr {head^loc , 4) , cool-^.p — A; 
cool-^z.o = incr {head^loc , yz <C 2), cool-’z.p = A; 

} 

This code is used in section 75. 

85. The location of the next instruction to be fetched is in a spec variable called 
inst-ptr. A slightly tricky optimization of the POP instruction is made in the common 
case that the speculative value of rJ is known. 

( Redirect the fetch if control changes at this inst 85 ) = 

{ register int predicted = 0; 

if {{op & *e0) = *40) (Predict a branch outcome l5l); 
head^noted = true; 
head-'hist = peekjiist; 

if {predicted V (/ & ctTchangeJ>it) V (i = syncid A -^{cool^loc.h & sign.bit))) { 
oldAail = tail = new-head; /* discard all remaining fetches */ 

(Restart the fetch coroutine 287); 

switch (i) { 

case jmp-. case br: case pbr-. case pushj-. insEptr = cool-’z; break; 
case pop: if {g[rJ].up-‘known A j < dispatch.max A -^dispatchJock A --^nullifying) { 
inst.ptr.o = incr {g[rJ].up-*o, yz 2), instjptr.p = A; break; 

} /* otherwise fall through, will wait on cool->go */ 

case go- case pushgo: case trap: case resume: case syncid: 
msAptr .p = UNKN0WN_SPEC; break; 
case trip: instjptr = zero. spec; break; 

} 

} 

} 

This code is used in section 75. 



br = 69, §49. 
cool: control *, §60. 
dispatch.lock: lockvar, §65. 
dispatch.max: int, §59. 

/: register int, §75. 

g: specnode [], §86. 

go =72, §49. 

go: specnode, §44. 

h: tetra, §17. 

head: fetch §69. 

hist: unsigned int, §68. 

i: register int, §12. 

incr: octa (), mmix-ARITH §6. 

inst: tetra, §68. 

instjptr: spec, §284. 

j: register int, §12. 



jmp = 80, §49. 
known: bool, §40. 
loc: octa, §68. 
loc: octa, §44. 
new.head: register fetch *, 
§74. 

noted: bool, §68. 
nullifying: bool, §315. 
o: octa, §40. 
old^tail: fetch +, §70. 
op: register mmix.opcode, 
§75. 

p: specnode *, §40. 

pbr = 70, §49. 

peek.hist: unsigned int, §99. 
pop = 75, §49. 



pushgo =74, §49. 
pushj = 71, §49. 
resume = 76, §49. 
t’J = 4, §52. 
sign.bit = macro, §80. 
syncid = 65, §49. 
tail: fetch +, §69. 
trap = 82, §49. 
trip = 83, §49. 
true = 1, §11. 

UNKNQWN_SPEC = macro, §71. 
up: specnode *, §40. 
y: spec, §44. 
yz: register int, §75. 

2 :: spec, §44. 
zero.spec: spec, §41. 
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86. At any given time the simulated machine is in two main states, the “hot state” 
corresponding to instructions that have been committed and the “cool state” corre- 
sponding to all the speculative changes currently being considered. The dispatcher 
works with cool instructions and puts them into the reorder buffer, where they grad- 
ually get warmer and warmer. Intermediate instructions, between hot and cool, have 
intermediate temperatures. 

A machine register like 1[101] or g[250] is represented by a specnode whose o field is 
the current hot value of the register. If the up and down fields of this specnode point 
to the node itself, the hot and cool values of the register are identical. Otherwise 
up and down are pointers to the coolest and hottest ends of a doubly linked list of 
specnodes, representing intermediate speculative values (sometimes called “rename 
registers”). The rename registers are implemented as the a; or a specnodes inside 
control blocks, for speculative instructions that use this register as a destination. 
Speculative instructions that use the register as a source operand point to the next- 
hottest specnode on the list, until the value becomes known. The doubly linked list 
of specnodes is an input-restricted deque: A node is inserted at the cool end when the 
dispatcher issues an instruction with this register as destination; a node is removed 
from the cool end if an instruction needs to be deissued; a node is removed from the 
hot end when an instruction is committed. 

The special registers rA, rB, . . . occupy the same array as the global registers g[32], 
g[33], ... . For example, rB is internally the same as g[0], because rB = 0. 

( External variables 4 ) -|-= 

Extern specnode g[256]; /* global registers and special registers */ 

Extern specnode *Z; /* the ring of local registers */ 

Extern int Iring.size; 

/* the number of on-chip local registers (must be a power of 2) */ 

Extern int max.rename.regs , maxjmem.slots\ /* capacity of reorder buffer */ 
Extern int rename.regs, mem.slots\ /* currently unused capacity */ 

87. Special register rC was the clock in the original definition of MMIX. But now the 
clock is just an external variable, called ticks . 

{ External variables 4 ) -|-= 

Extern octa ticks', /* the internal clock */ 

88. (Global variables 20 } -l-= 

int Iringjmask', /* for calculations modulo Iring.size */ 

89. The addr fields in the specnode lists for registers are used to identify that 
register in diagnostic messages. Such addresses are negative; memory addresses are 
positive. 

All registers are initially zero except rG, which is initially 255, and rN, which has 
a constant value identifying the time of compilation. (The macro ABSTIME is defined 
externally in the file abstime.h, which should have just been created by ABSTIME; 
ABSTIME is a trivial program that computes the value of the standard library function 
time (A). We assume that this number, which is the number of seconds in the “UNIX 
epoch,” is less than 2^^. Beware: Our assumption will fail in February of 2106.) 
T^deflne VERSION 1 /* version of the MMIX architecture that we support */ 
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95^:define SUBVERSION 0 /* secondary byte of version number */ 

T^tdefine SUBSUBVERSION 0 /* further qualification to version number */ 

{ Initialize everything 22 } += 

renamejregs = maxjrenamejregs\ 
mem.slots = max-mem.slots ; 

Iringjmask = Iring^size — 1; 
for (j = 0; j < 256; j++) { 

g\j].addr .h = sign.bit , g\j].addr .1 = j , g[j], known = true-, 

9\j]-up = g[j].down = &cg[j]; 

} 

g[rG].o.l = 255; 

g[rN].o.h = (VERSION < 24) + (SUBVERSION < 16) + (SUBSUBVERSION < 8); 
g[rN].o.l — ABSTIME; /* see comment and warning above */ 
for (j = 0; j < Iring.size-, j++) { 

l[j].addr .h = signMt,l[j].addr.l = 256 + j,l[j], known = true-, 
l[j].up = l\j].down = M[j]-, 

} 

90. ( Internal prototypes 13 ) += 

static void prinEspecnode.id ARGS((octa)); 

91. (Subroutines 14) += 
static void print.specnodeAd{a) 

octa a; 

{ 

if {a.h = signj)it) { 

if [a.l < 32) printf {speciaLname[a.l]); 
else if {a.l < 256) printf ,a.l); 

else print/ (" 1 [°/,d] ",a.l — 256); 

} else if {a.h 7^ (tetra) —1) { 

pnnt/("m[" ); print. octa {a)-, printf {"'\'')-, 

} 

} 



a: specnode, §44. 

ABSTIME = macro, abstime.h. 
addr: octa, §40. 

ARCS = macro, §6. 
cool: control *, §60. 
down: specnode *, §40. 
Extern = macro, §4. 
h: tetra, §17. 
hot: control *, §60. 



j: register int, §10. 
known: bool, §40. 

1 : tetra, §17. 
o: octa, §40. 
octa = struct, §17. 
print.octa: static void (), §19. 
printf: int (), <stdio.h>. 
rB=0, §52. 
rG = 19, §52. 



rN = 9, §52. 
sign.bit = macro, §80. 
speciaLname: char *[], §53. 
specnode = struct, §40. 
tetra = unsigned int, §17. 
time: time.t (), <time.h>. 
true = 1, §11. 
up: specnode *, §40. 
x: specnode, §44. 
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92. The specval subroutine produces a spec corresponding to the currently coolest 
value of a given local or global register. 

( Internal prototypes 13 ) += 

static spec specval ARGS((specnode *)); 

93. ( Subroutines 14 ) += 
static spec specval (r) 

specnode *r; 

{ spec res ; 

if (r-up^known) res .o = r-‘up~*o, res .p = A\ 
else res .p = i~>up ; 
return res; 

} 

94. The spec-install subroutine introduces a new speculative value at the cool end 
of a given doubly linked list. 

( Internal prototypes 13 ) += 

static void specjinstall ARCS ((specnode specnode *)); 

95. (Subroutines 14 ) += 

static void specjinstall {r,t) /* insert t into list r */ 

specnode *r, *t; 

{ 

t-*up = r-up-, 
t-up-down = t\ 
r-up = t; 
t-down = r; 
t^addr = r~*addr\ 

} 

96. Conversely, spec-rem takes such a value out. 

( Internal prototypes 13 ) += 

static void spec.rem ARCS ((specnode *)); 

97. ( Subroutines 14 ) += 

static void specjrem{t) /* remove t from its list */ 

specnode 

{ register specnode *u = t^up, *d = t-down\ 
u-'douin = d; d^up = u; 

} 
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98. Some special registers are so central to MMIX’s operation, they are carried along 
with each control block in the reorder buffer instead of being treated as source and 
destination registers of each instruction. For example, the register stack pointers 
rO and rS are treated in this way. The normal specnodes for rO and rS, namely 
g[rO] and g[rS], are not actually used; the cool values are called cooLO and cooLS . 
(Actually cooLO and cooLS correspond to the register values divided by 8, since rO 
and rS are always multiples of 8.) 

The arithmetic status register, rA, is also treated specially. Its event bits are kept 
up to date only at the “hot” end, by accumulating values of arith_exc ; an instruction 
to GET the value of rA will be executed only in the hot seat. The other bits of rA, 
which are needed to control trip handlers and floating point rounding, are treated in 
the normal way. 

( External variables 4 ) += 

Extern octa cooLO, cooLS; /* values of rO, rS before the cool instruction */ 

99. ( Global variables 20 ) += 

int cooLL, cooLG; /* values of rL and rG before the cool instruction */ 
unsigned int cooLhist, peekjiist; /* history bits for branch prediction */ 
octa new-0, newS; j* values of rO, rS after cool */ 



addr: octa, §40. 

ARCS = macro, §6. 
arith^exc: unsigned int, §44. 
cool: control *, §60. 
down: specnode *, §40. 
Extern = macro, §4. 



g: specnode [], §86. 
known: bool, §40. 
o: octa, §40. 
octa = struct, §17. 
p: specnode *, §40. 



rO = 10, §52. 

rS' = ll, §52. 

spec = struct , §40. 

specnode = struct, §40. 

up: specnode *, §40. 
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100. (Install default fields in the cool block lOo) = 
cool-op = op; cool-i = i; 

cool-xx = {head-inst 16) & *ff ; cool-yy = (head-inst 8) & 
cool-zz = (head-inst) &*ff; 
cool-loc = head-loc\ 

cool-y = cool-z = cool-b = cool-ra = zero.spec ; 

cool-x.o = cool-a.o = cool-rl.o = zero.octa; 

cool-x. known = false; 

cool-x.up = A; 

cool-a. known = false; 

cool-a.up = A; 

cool-rl .known = true; 

cool-rl.up = A; 

cool-need.b = cool-need.ra — cool-ren-X = cool-mem^ = cool-ren.a = cool-setj = false 
cool-arith.exc = cool-denin = cool-denout = 0; 

if {(head-loc.h sign.bit) A ^{g[rU].o.h &z*8000)) cool-usage = false; 

else cool-usage = ((op & (p[r[/].o.h 16)) = g[rlI].o.h 3> 24 ? true : false); 

new.O = cool-cur.O = cooLO; new.S = cool-cur.S = cooLS; 

cool-interrupt = head-interrupt; 

cool-hist — peekjiist; 

cool- go. o = incr (cool-loc, 4); 

cool-go .known = false, cool-go .addr .h = —1, cool-go .up = (specnode *) coof; 
cool-interim = cool-staek^alert = false; 

This code is used in section 75. 

101. (Dispatch an instruction to the cool block if possible, otherwise goto stall loi ) = 
if (new-Cool = hot) goto stall; /* reorder buffer is full */ 

(Make sure cooLL and cooLG are up to date 102 ); 

( Install the operand fields of the cool block 103 ) ; 

if (/ & XAs-desEbit) (Install register X as the destination, or insert an internal 
command and goto dispatch.done if X is marginal 110 ); 
switch (i) { 

(Special cases of instruction dispatch 117 ) 

default : break ; 

} 

dispatch.done-. 

This code is used in section 75. 

102. The UNSAVE operation begins by loading register rG from memory. We don’t 
really need to know the value of rG until twelve other registers have been unsaved, so 
we aren’t fussy about it here. 

( Make sure cooLL and cooLG are up to date 102 ) = 
if (-^g[rL]. up-known) goto stall; 
cooLL = g[rL\.up-o.l; 

if (-ig[rG].up-known A -<(op = UNSAVE A cool-xx = 1)) goto stall; 
cooLG = g[rG].up-o.l; 

This code is used in section 101. 
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103. (Install the operand fields of the cool block 103) = 

if (resuming) (Insert special operands when resuming an interrupted operation 324} 
else { 

if (f &c* 10) (Set cool-‘b from register X loe) 
if (third.operand\op] A (coolm ^ trap)) 

(Set cool-‘b and/or cool-ra from special register 108 ); 
if (/&*l) cool-^z.o.l = cool-^zz; 
else if (/ & *2) ( Set cool-z from register Z 104 } 
else if {{op &i*f0) = *e0) (Set cool-z as an immediate wyde 109); 
if (/&*4) cool^.o.l = cool^yy, 
else if (/ & *8) ( Set cool^ from register Y 105 ) 

} 

This code is used in section 101. 

104. (Set cool-z from register Z 104 ) = 

{ 

if {cool-^zz > cooLG) cool-^z = specval{icg[cool-‘zz]); 

else if {cool-'zz < cooLL) cool-’z = specval{&d[{cooLO .1 + cool-'zz) & Iringjmask])-, 

} 

This code is used in section 103. 

105. (Set cool^ from register Y 105) = 

{ 

if {cool^yy > cooLG) coolly = specval{&cg[cool^yy])\ 

else if {cool^yy < cooLL) cool^ = specval{&zl[{cooLO d + cool^yy) & Iring.mask]); 

} 

This code is used in section 103. 



a: specnode, §44. 

addr: octa, §40. 

arith.exc: unsigned int, §44. 

b: spec, §44. 

cool: control *, §60. 

cooLG: int, §99. 

cooLL: int, §99. 

cooLO: octa, §98. 

cooLS: octa, §98. 

cur.O: octa, §44. 

cur.S: octa, §44. 

denin: int, §44. 

denout: int, §44. 

/: register int, §75. 

false = 0, §11. 

g: specnode [], §86. 

go: specnode, §44. 

h: tetra, §17. 

head: fetch *, §69. 

hist: unsigned int, §44. 

hot: control *, §60. 

i: internal.opcode, §44. 

i: register int, §12. 

incr: octa (), mmix-ARITH §6. 

inst: tetra, §68. 

interim: bool, §44. 



interrupt: unsigned int, §68. 
interrupt: unsigned int, §44. 
known: bool, §40. 

1 : tetra, §17. 

1 : specnode *, §86. 
loc: octa, §44. 
loc: octa, §68. 

Iring.mask: int, §88. 
mem.x: bool, §44. 
need^b: bool, §44. 
need^ra: bool, §44. 
new^cool: control *, §78. 
new.O: octa, §99. 
new.S: octa, §99. 
o: octa, §40. 

op: register mmix.opcode, 

§75. 

op: mmix.opcode, §44. 
peek.hist: unsigned int, §99. 
ra: spec, §44. 
ren.a: bool, §44. 
ren.x: bool, §44. 
resuminn: int, §78. 
rG = 19, §52. 
rl: specnode, §44. 



rL = 29, §52. 
rU = 17, §52. 
setJ: bool, §44. 
sign.bit = macro, §80. 
specnode = struct, §40. 
specval: static spec (), §93. 
stack.alert: bool, §44. 
stall: label, §75. 
third.operand: unsigned char 
[],§107. 
trap = 82, §49. 
true = 1, §11. 

UNSAVE = #fb, §47. 
up: specnode *, §40. 
usage: bool, §44. 
x: specnode, §44. 

XSs^desLbit = ^20, §83. 
xx: unsigned char, §44. 
y: spec, §44. 
yy: unsigned char, §44. 

.z: spec, §44. 
zero.octa: octa, 

MMIX-ARITH §4. 
zero.spec: spec, §41. 
zz: unsigned char, §44. 
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106. (Set cool->b from register X loe) = 

{ 

if (cool^xx > cooLG) cool^b = specval{&ig[cool->xx])\ 

else if {cool^xx < cooLL) cool^b = specval{&il[{cooLO .1 + cool^xx) & Iringjmask])-, 
if {f &i reLaddr.bit) cool^need.b = true; /* br, pbr */ 

} 

This code is used in section 103. 



107. If an operation requires a special register as third operand, that register is 
listed in the third-operand table. 

( Global variables 20 } += 

unsigned char third.operand [256] = { 

0, rA,0,0, rA, rA, rA, rA, /* TRAP,... */ 
rA,rA,rA,rA,rA,rA,rA,rA, /* PLOT,... */ 
rA, rE , rE , rE , rA, rA, rA, rA, /* FMUL , . . . */ 
rA, rA,0,0, rA, rA, rD , rD , /* MUL, ... */ 



rA,rA,0,0,rA,rA,0, 


0, 


/* ADD, . . . 


0,0, 0,0, 0,0, 0,0, 


/* 


2ADDU, . 


■ */ 


0, 0, 0, 0, rA, rA, 0, 0, 




/* CMP, 


.. */ 


rA, rA, 0, 0, 0, 0, 0, 0, 




/* SL, . . 


■ */ 


0,0, 0,0, 0,0, 0,0, 


/* 


BN, . . . >1 


/ 


0,0, 0,0, 0,0, 0,0, 


/* 


BNN, . . . 


*/ 


0,0, 0,0, 0,0, 0,0, 


/* 


PBN, . . . 


*/ 


0,0, 0,0, 0,0, 0,0, 


!* 


PBNN, . . 


*/ 


0,0, 0,0, 0,0, 0,0, 


/* 


CSN, . . . 




0,0, 0,0, 0,0, 0,0, 


/* 


CSNN, . . 


*/ 


0,0, 0,0, 0,0, 0,0, 


/* 


ZSN, . . . 


*/ 


0,0, 0,0, 0,0, 0,0, 


/* 


ZSNN, . . 


*/ 


0,0, 0,0, 0,0, 0,0, 


/* 


LDB, . . . 


*/ 


0,0, 0,0, 0,0, 0,0, 


/* 


LDT, . . . 


*/ 


0,0, 0,0, 0,0, 0,0, 


/* 


LDSF, . . 


*/ 


0,0, 0,0, 0,0, 0,0, 


/* 


LDVTS, . 


• */ 


rA,rA,0,0,rA,rA,0, 


0 , 


/* STB, . . . 


rA, rA,0, 0, 0, 0, 0, 0, 




/* STT, 


•• */ 


rA, rA, 0, 0, 0, 0, 0, 0, 




/* STSF, 


... * 


0,0, 0,0, 0,0, 0,0, 


/* 


SYNCD, . 


• */ 


0,0, 0,0, 0,0, 0,0, 


/* 


OR, . . . >i 


/ 


0,0, 0,0, 0,0, 0,0, 


/* 


AND, . . . 


*/ 


0,0, 0,0, 0,0, 0,0, 


/* 


BDIF, . . 


*/ 


rM , rM ,0, 0, 0, 0 , 0 , 0, 




/* MUX 


... * 


0,0, 0,0, 0,0, 0,0, 


/* 


SETH, . . 




0,0, 0,0, 0,0, 0,0, 


/* 


DRH, . . . 




0,0, 0,0, 0,0, 0,0, 


/* 


JMP, . . . 


*/ 


rJ, 0,0, 0,0, 0,0, 255} 




/* POP, 


... * 
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108. The coohb field is busy in operations like STB or STSF, which need rA. So we 
use cool-ra instead, when rA is needed. 

(Set cool^b and/or cool~‘ra from special register los) = 

{ 

if {third.operand\op] = rAW third.operand[op] = rE) 
cool^need^ra = true, cool-’ra = specval{Ezg[rA\)-, 
if {third.operand\op] 7^ rA) 

cool-^need.b — true, cool-'b = specval[k.g[third.operand\op\\)\ 

} 

This code is used in section 103. 

109. (Set cool-*z as an immediate wyde 109 ) = 

{ 

switch {op 3) { 

case 0: cool-z.o.h = yz 16; break; 
case 1: cool^z.o.h = yz\ break; 
case 2: cool^z.o.l = yz 16; break; 
case 3: cool-z.o.l — yz\ break; 

} 

if {i 7^ set) { /* register X should also be the Y operand */ 

cool~y = cool^b] 
cool^b = zero-spec', 

} 

} 

This code is used in section 103. 



b: spec, §44. 
br = 69, §49. 
cool: control *, §60. 
cooLG: int, §99. 
cooLL: int, §99. 
cooLO: octa, §98. 

/: register int, §75. 
g: specnode [], §86. 
h: tetra, §17. 
i: register int, §12. 
1: specnode *, §86. 
1: tetra, §17. 



Iring.mask: int, §88. 
need^b: bool, §44. 
need^ra: bool, §44. 
o: octa, §40. 

op: register mmix.opcode, 

§75. 

pbr = 70, §49. 

M = 21, §52. 
ra: spec, §44. 
rD = l, §52. 
rE = 2, §52. 



reLaddr.bit = ^ 40 , §83. 
rJ = 4, §52. 
rM =5, §52. 
set = 33, §49. 

specval: static spec (), §93. 
true = 1, §11. 

XX : unsigned char, §44. 

y: spec, §44. 

yz: register int, §75. 

2 :: spec, §44. 
zero.spec: spec, §41. 
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110. (Install register X as the destination, or insert an internal command and goto 
dispatch.done if X is marginal no) = 

{ 

if (cool^xx > cooLG) { 

if (i ^ pushgo Ai ^ pushj Ai ^ cswap) 

cool^ren.x = true, spec-install{&ig[cool->xx], &icool^x); 

} else if {cool->xx < cooLL) { 
if {i 7 ^ cswap) 

cool-ren.x = true, spec-install{&il[{cooLO .1 + cool-xx) & Iringjmask], &ccool->x)-, 

} else { /* we need to increase L before issuing head-inst */ 

increase.L: if {{{cooLS .1 — cooLO .1 — cooLL — 1) & Iringjmask) = 0) 

(Insert an instruction to advance gamma 113 ) 
else (Insert an instruction to advance beta and L 112 ); 

} 

} 

This code is used in section 101. 

111. (Check for sufficient rename registers and memory slots, or goto stall ill) = 
if {rename.regs < cool-^reUjX + cool-^renja) goto stall; 

if {cool-memjc) 

if (mem^slots) mem^slots — ; else goto stall; 
renamejregs — = cool^reUjX + cool-^renja; 

This code is used in section 75. 

112. The incrl instruction advances /3 and rL by 1 at a time when we know that 
/3 7 ^ 7 , in the ring of local registers. 

(Insert an instruction to advance beta and L 112 ) = 

{ 

cool-d = incrl; 

spec.install{&il[{cooLO .1 + cooLL) & Iringjmask], &i,cool-x); 
cool->needJ) = cool-^need^ra = false; 
cool-‘y = cool^z = zerOjSpec; 

cool-ex. known = true; /* cool-ex. o = zero .octa */ 
speCjinstall{&Lg[rL\, ^cool^rl); 
cool-^rl.o.l = cooLL + 1; 
cool-^renjx = cool-setj — true; 

op = SETH; /* this instruction to be handled by the simplest units */ 
cool-'interim — true; 
goto dispatch jdone; 

} 

This code is used in section 110. 

113. The incgamma instruction advances 7 and rS by storing an octabyte from the 
local register ring to virtual memory location cooLS 3. 

(Insert an instruction to advance gamma 113 ) = 

{ 

cool-’need.b — cool->needjra = false; 
cool-i = incgamma; 
neWjS = incr{cooLS , 1); 
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cool-‘b = specval {&il[cooLS .1 & Iring.mask]); 
cool-nj.p = A, cool-nj.o = shiftJeft{cooLS ,3)', 
cool^z = zero spec-, 

cool^mem.x = true, spec-install{&imem,&icool~’x); 

op = STOU; /* this instruction needs to be handled by load/store unit */ 
cool-'interim = true-, 

cool-^stack.alert = ~t[cool-gj.o.h &i sign.bit); 
goto dispatch.done-, 

} 

This code is used in sections 110, 119, and 337. 



b: spec, §44. 

cool: control *, §60. 

cooLG: int, §99. 

cooLL: int, §99. 

cooLO: octa, §98. 

cooLS: octa, §98. 

cswap = 68, §49. 

dispatch.done: label, §101. 

false = 0, §11. 

g: specnode [], §86. 

h: tetra, §17. 

head: fetch *, §69. 

i: register int, §12. 

i: internal.opcode, §44. 

incgamma = 84, §49. 

incr: octa (), mmix-ARITH §6. 

incrl = 86, §49. 

inst: tetra, §68. 

interim: bool, §44. 

known: bool, §40. 



1: specnode *, §86. 

1: tetra, §17. 

Iring.mask: int, §88. 
mem: specnode, §115. 
mem.slots: int, §86. 
mem.x: bool, §44. 
need^b: bool, §44. 
need^ra: bool, §44. 
new^S: octa, §99. 
o: octa, §40. 

op: register mmix.opcode, 

§75. 

p: specnode *, §40. 
pushgo = 74, §49. 
pushj = 71, §49. 
ren.a: bool, §44. 
ren.x: bool, §44. 
rename^regs: int, §86. 
rl: specnode, §44. 
rL = 20, §52. 



setJ: bool, §44. 

SETH = ^e0, §47. 
shiftJeft: octa (), 
MMIX-ARITH §7. 
sign.bit = macro, §80. 
spec^install: static void (), 
§95. 

specval: static spec (), §93. 
stack^alert: bool, §44. 
stall: label, §75. 

STDU = ^ae, §47. 
true = 1, §11. 
x: specnode, §44. 

XX : unsigned char, §44. 
y: spec, §44. 

spec, §44. 
zero.octa: octa, 
MMIX-ARITH §4. 
zero.spec: spec, §41. 
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114. The decgamma instruction decreases 7 and rS by loading an octabyte from 
virtual memory location {cooLS — 1) -C 3 into the local register ring. The value of /3 
may need to be decreased too (by decreasing rL). 

(Insert an instruction to decrease gamma 114) = 

{ 

if {cooLO.l + cooLL = cooLS .1 + Iring.size) { /* don’t let 7 pass j3 */ 

if {cool-d = pop A cool->xx = cooLL A cooLL > 1) { 

cool^ = or; /* we’ll preserve the main result by moving it down */ 
head-^inst — = *10000; /* decrease X field of POP in fetch buffer */ 

op = OR; 

cool->y = specval [&il[{cool-0 .1 + cool-xx — 1) & Iringjmask])-, 
spec.install{&il[{cooLO .1 + cool-xx — 2) & Iringjmask], &lcooI-x)\ 

} else { /* decrease rL by 1 */ 

spec.install{&ig[rL\,&icool^rl)-, cool^rl.o.l = cooLL — 1; cool^setj — true, 

} 

} 

if {cool-d yf or) { 
cool-d = decgamma-, 
new^S = incr {cooLS , —1)-, 
cool-gj.p = A, cool-gj.o = shiftJeft{new.S ,3)-, 
spec-install{&d[new-S .1 & Iringjmask], &icool-x)-, 

op — LDOU; /* this instruction needs to be handled by load/store unit */ 
cool^ptr.a = (void *) mem. up-, 

} 

cool-^z = cool^b = zero.spec; cool^need-b — false-, cool-'reu-X = cool~'interim — true-, 
goto dispatch.done -, 

} 

This code is used in section 120. 

115. Storing into memory requires a doubly linked data list of specnodes like the 
lists we use for local and global registers. In this case the head of the list is called 
mem, and the addr fields are physical addresses in memory. 

( External variables 4 ) += 

Extern specnode mem; 

116. The addr field of a memory specnode is all Is until the physical address has 
been computed. 

( Initialize everything 22 } += 

mem. addr .h = mem. addr . I = —1; 
mem. up = mem. down = &mem; 

117. The CSWAP operation is treated as a partial store, with $X as a secondary 
output. Partial store {pst) commands read an octabyte from memory before they 
write it. 

(Special cases of instruction dispatch 117) = 
case cswap: cool-'rema — true; 

specAnstall{cool->xx > cooLG ? &ig]cool^xx] : k.l[{cooLO .1 + cool->xx) & Iring.mask], 
&ccool^a); 
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cool-n = pst; 

case st\ if {{op & *f e) = STCO) cool-b.o.l = cool-^xx; 

case pst: cool^memjx = true, spec J.nstall{Ezmem,&icool-x)\ break; 

case Id: case Idunc: cool^ptr.a — (void *) mem.up\ break; 

See also sections 118, 119, 120, 121, 122, 227, 312, 322, 332, 337, 347, and 355. 

This code is used in section 101. 

118. When new data is PUT into special registers 8 or 15-20 (namely rC, rK, rQ, rU, 
rV, rG, or rL) it can affect many things. Therefore we stop issuing further instructions 
until such PUTs are committed. Moreover, we will see later that such drastic PUTs 
defer execution until they reach the hot seat. 

(Special cases of instruction dispatch 117) += 
case put: if {cool-yy 7^ 0 V cool^xx > 32) goto illegaLinst; 
if {cool-'xx > 8) { 

if {cool-^xx < 11 A cool-xx ^ 8) goto illegaLinst- 

if {cool^xx < 18 A -<{cool^loc.h &i sign.bit)) goto privileged jinst-, 

} 

if {cool~*xx = 8 V {cool~*xx > 15 A cool^xx < 20)) freeze.dispatch = true\ 
cool^ren^x = true,specMstall{Szg\cool^xx]^Szcool~*x)\ break; 

case get: if {cool^yy V cool^zz > 32) goto illegaLinst; 
if {cool~*zz = rO) cool-^z.o = shiftJeft{cooLO ,S); 
else if {cool~^zz = rS) cool~^z.o= shiftJeft{cooLS ,3); 
else cool~*z = specval{^g\cool-*zz\); break; 
illegaLinst: cool-^interrupt |=B_BIT; goto noopJnst; 
case Idvts: if {cool^loc.h Sz sign.bit) break; 
privileged.inst: cool-*interrupt |=K_BIT; 
noop.inst: cool-d = noop\ break; 



a: specnode, §44. 
addr: octa, §40. 
b: spec, §44. 

B_BIT = 1 < 2, §54. 
cool: control *, §60. 
cooLG: int, §99. 
cooLL: int, §99. 
cooLO: octa, §98. 
cooLS: octa, §98. 
cswap = 68, §49. 
decgamma = 85, §49. 
dispatch.done: label, §101. 
down: specnode *, §40. 
Extern = macro, §4. 
false = 0, §11. 
freeze.dispatch: register 
bool, §75. 

q: specnode [1, §86. 

56^ = 54, §49. 

h: tetra, §17. 

head: fetch *, §69. 

i: inter nal.opcode, §44. 

incr: octa (), mmix-ARITH §6. 

inst: tetra, §68. 

interim: bool, §44. 



interrupt: unsigned int, §44. 
K_BIT = 1 < 3, §54. 

1: specnode *, §86. 

1: tetra, §17. 

Id = 56, §49. 

LD0U = =^8e, §47. 

Idunc = 59, §49. 

Idvts = 60, §49. 
loc: octa, §44. 

Iring.mask: int, §88. 
Iring.size: int, §86. 
mem.x: bool, §44. 
need^b: bool, §44. 
new^S: octa, §99. 
noop = 81, §49. 
o: octa, §40. 

op: register mmix.opcode, 

§75. 

0R = ^c0, §47. 
or = 34, §49. 
p: specnode *, §40. 
pop = 75, §49. 
pst = 66, §49. 
ptr^a: void *, §44. 
put = 55, §49. 



ren^a: bool, §44. 
renjx: bool, §44. 
rl: specnode, §44. 
t’L = 20, §52. 
rO = 10, §52. 

7’5' = 11, §52. 
setJ: bool, §44. 
shiftJeft: octa (), 
MMIX-ARITH §7. 
sign.bit = macro, §80. 
spec^install: static void (), 
§95. 

specnode = struct, §40. 
specval: static spec (), §93. 
st = 63, §49. 

STCO = ^b4, §47. 
true = 1, §11. 
up: specnode *, §40. 
x: specnode, §44. 

XX : unsigned char, §44. 

y: spec, §44. 

yy: unsigned char, §44. 

2 :: spec, §44. 
zero.spec: spec, §41. 
zz: unsigned char, §44. 
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119 . A PUSHGO instruction with X > G causes L to increase momentarily by 1 , even 
if L = G. But the value of L will be decreased before the PUSHGO is complete, so it 
will never actually exceed G. Moreover, we needn’t insert an incrl command. 

(Special cases of instruction dispatch 117 ) += 
case pushgo: instjptr.p = k,cool-‘go\ 
case pushj-. 

{ register int x = cool-xx\ 
if {x > cooLG) { 

if {{{cooLS .1 — cooLO.l — cooLL — 1) & Iringjmask) = 0) 

(Insert an instruction to advance gamma 113 ) 

X = cooLL\ cooLL++\ 

cool-‘ren.x = true, spec.install{&il[{cooLO .1 + *) & lring.mask],&icool-x)', 

} 

cool~‘x. known = true, cool-x.o.h = 0, cool->x.o.l = x; 
cool^ren^a = true, specAnstall{&ig[rJ],&icool^a)\ 
cool-*a. known = true, cool^a.o = incr(cooAZoc, 4); 

cool-^setJ, = true, specAnstall{&Lg[rL], kcool^rl)-, cool->rl.o.l = cooLL — x — 1\ 
new.O = incr{cooLO , X + 1)\ 

} break; 

case syncid: if {cool-'loc.h sign.bit) break; 
case go-, instjptr.p = Szcool^go-, break; 

120 . We need to know the topmost “hidden” element of the register stack when 
a POP instruction is dispatched. This element is usually present in the local register 
ring, unless 7 = 0. 

Once it is known, let x be its least significant byte. We will be decreasing rO by 
a; + 1, so we may have to decrease 7 repeatedly in order to maintain the condition 
rS < rO. 

(Special cases of instruction dispatch 117 ) += 
case pop-, if {cool^xx A cooLL > cool^xx) 

cool^ = specval {&il[{cooLO .1 + cool-’xx — 1) & Iringjmask])-, 
pop.unsave: if {cooLS .1 = cooLO.l) (Insert an instruction to decrease gamma 114); 

{ register tetra x-, 
register int new-L-, 

register specnode *p = l[{cooLO .1 — 1) & lring.mask].up; 
if (jr'known) x = (p-o.l) else goto stall; 

if ( (tetra) (cooLO.Z — cooLS .1) < x) (Insert an instruction to decrease gamma 114); 
new.O = incr{cooLO , —x — 1); 

if {cool-d = pop) new-L — x -\- (cool^xx < cooLL ? cool-xx : cooLL + 1); else new-L = x 
if (new-L > cooLG) new.L = cooLG; 

if {x < new-L) cool^ren.x = true, spec-install {&il[(cooLO .1 — 1) & Iringjmask], k.cool-’x); 
cool-’setJ — true, specAnstall{&cg]rL], &ccool^rl); cool-»rl.o.l = new-L; 
if (coolH = pop) { 
cool-^z.o.l = yz ^2; 

if {instjptr.p = UNKN0WN_SPEC A newJiead = tail) instjptr.p = Szcool^go; 

} 

break; 

} 
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121. (Special cases of instruction dispatch 117 ) += 

case mulu: cool-ren.a = true, spec-install{&ig[rH],Szcool-‘a)- break; 

case div: case divu: cool^ren^a = true, spec-install{&ig[rR],Szcool^a)-, break; 

122. It’s tempting to say that we could avoid taking up space in the reorder buffer 
when no operation needs to be done. A JMP instruction qualifies as a no-op in this 
sense, because the change of control occurs before the execution stage. However, 
even a no-op might have to be counted in the usage register rU, so it might get into 
the execution stage for that reason. A no-op can also cause a protection interrupt, 
if it appears in a negative location. Even more importantly, a program might get 
into a loop that consists entirely of jumps and no-ops; then we wouldn’t be able to 
interrupt it, because the interruption mechanism needs to find the current location 
in the reorder buffer! At least one functional unit therefore needs to provide explicit 
support for JMP, JMPB, and SWYM. 

The SWYM instruction with F_BIT set is a special case: This is a request from the 
fetch coroutine for an update to the IT-cache, when the page table method isn’t 
implemented in hardware. 

(Special cases of instruction dispatch 117 ) -|-= 
case noop: if (cool-^interrupt SzF_Bn) { 

cool^go.o = coolly. o = cool-loc, insEptr = specval [&ig[rT]); 

} 

break; 

123. (Undo data structures set prematurely in the cool block and break 123 } = 
if {cool^ren^x V cool-*mem.x) spec.rem{Szcool-*x)', 

if {cool^ren.a) spec.rem{&ccool^a)] 

if (cooHsetJ) spec.rem{Szcool~*rl)\ 

if (inst^ptr .p = Szcool^go) msUpir .p = UNKN0WN_SPEC; 

break; 

This code is used in section 75. 



a: specnode, §44. 
cool: control *, §60. 
cooLG: int, §99. 
cooLL: int, §99. 
cooLO: octa, §98. 
cooLS: octa, §98. 
div = 9, §49. 
divu = 28, §49. 

F_BIT = 1 < 17, §54. 
g: specnode [], §86. 
go =72, §49. 
go: specnode, §44. 
h: tetra, §17. 
i: internal.opcode, §44. 
incr: octa (), mmix-ARITH §6. 
incrl = 86, §49. 
instjptr: spec, §284. 
interrupt: unsigned int, §44. 
known: bool, §40. 

1 : specnode *, §86. 

1: tetra, §17. 



loc: octa, §44. 

Iring.mask: int, §88. 
mem.x: bool, §44. 
mulu = 27, §49. 
new.head: register fetch *, 
§74. 

new.O: octa, §99. 
noop = 81, §49. 
o: octa, §40. 
p: specnode *, §40. 
pop = 75, §49. 
pushgo = 74, §49. 
pushj = 71, §49. 
ren.a: bool, §44. 
ren.x: bool, §44. 
rH = Z, §52. 
rJ =4, §52. 
rl: specnode, §44. 
rL = 20, §52. 
rR = 6, §52. 



rT = 13, §52. 
setJ: bool, §44. 
sign.bit = macro, §80. 
spec^install: static void (), 
§95. 

spec^rem: static void (), §97. 
specnode = struct, §40. 
specval: static spec (), §93. 
stall: label, §75. 
syncid = 65, §49. 
tail: fetch *, §69. 
tetra = unsigned int, §17. 
true = 1, §11. 

UNKNDWN_SPEC = macro, §71. 
up: specnode *, §40. 
x: specnode, §44. 

XX : unsigned char, §44. 

y: spec, §44. 

yz: register int, §75. 

2 :: spec, §44. 
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124. The execution stages. MMIX’s raison d’etre is its ability to execute in- 
structions. So now we want to simulate the behavior of its functional units. 

Each coroutine scheduled for action at the current tick of the clock has a stage 
number corresponding to a particular subset of the MMIX hardware. For example, the 
coroutines with stage = 2 are the second stages in the pipelines of the functional 
units. A coroutine with stage = 0 works in the fetch unit. Several artificially large 
stage numbers are used to control special coroutines that do things like write data 
from buffers into memory. 

In this program the current coroutine of interest is called self; hence self^stage is 
the current stage number of interest. Another key variable, seZ/-*ctZ, is called data; 
this is the control block being operated on by the current coroutine. We typically are 
simulating an operation in which data-x is being computed as a function of data^ 
and data-'Z. The data record has many fields, as described earlier when we defined 
control structures; for example, data^owner is the same as self , during the execution 
stage, if it is nonnull. 

This part of the simulator is written as if each functional unit is able to handle 
all 256 operations. In practice, of course, a functional unit tends to be much more 
specialized; the actual specialization is governed by the dispatcher, which issues an 
instruction only to a functional unit that supports it. Once an instruction has been 
dispatched, however, we can simulate it most easily if we imagine that its functional 
unit is universal. 

Coroutines with higher stage numbers are processed first. The three most impor- 
tant variables that govern a coroutine’s behavior, once self^stage is given, are the 
external operation code data-op, the internal operation code data-i, and the value of 
data-’state . We typically have data-’ state = 0 when a coroutine is first fired up. 

( Local variables 12 } -|-= 

register coroutine *self; /* the current coroutine being executed */ 

register control *data; /* the control block of the cnrrent coroutine */ 

125. When a coroutine has done all it wants to on a single cycle, it says goto done. 
It will not be scheduled to do any further work unless the schedule routine has been 
called since it began execution. The wait macro is a convenient way to say “Please 
schedule me to resume again at the current data^state” after a specified time; for 
example, wait{l) will restart a coroutine on the next clock tick. 

T^tdeflne wait{t) { schedule [self ,t, data-state); goto done; } 

^^^define pass-after{t) schedule{self + l,t, data-’state) 

T^deflne sleep { self^next = self; goto done; } /* wait forever */ 

^define awaken (c, t) schedule (c, t, cr’ctl-’state ) 
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{ Execute all coroutines scheduled for the current time 125 ) = 
cur.time++; if {cur.time = ring.size) cur.time = 0; 

for {self = queuelist {cur. time); self ^ &isentinel; self = sentinel .next) { 

sentinel. next = self -’next; self -’next = A; /* unschedule this coroutine */ 

data = self-ctl; 

if {verbose & eoroutine.bit) { 

print/ ("urunningu" ); print.eoroutine.id{self); printf{"u"); 
print.controLblock ( data ) ; printf ( " \n " ) ; 

} 

switch {self -’stage) { 

case 0 : (Simulate an action of the fetch coroutine 288); 
case 1 : (Simulate the first stage of an execution pipeline 130 ); 
default: (Simulate later stages of an execution pipeline 135 ); 

(Cases for control of special coroutines 120); 

} 

terminate-, if {self-’lockloc) *{self-’lockloc)=A,self-’lockloc=A; 
done: ; 

} 

This code is used in section 64. 

126. A special coroutine whose stage number is vanish simply goes away at its 
scheduled time. 

( Cases for control of special coroutines 126 ) = 
case vanish: goto terminate; 

See also sections 215, 217, 222, 224, 232, 237, and 257. 

This code is used in section 125. 

127. ( Global variables 20} += 

coroutine memJocker; /* trivial coroutine that vanishes */ 
coroutine Dlocker\ /* another */ 

control vanish.ctl; /* such coroutines share a common control block */ 



control = struct, §44. 
coroutine = struct, §23. 
coroutine^bit = 1 <C 2, §8. 
ctl: control *, §23. 
cur^time: int, §29. 

2 : inter nal.opcode, §44. 
lockloc: coroutine **, §23. 
next: coroutine *, §23. 
op: mmix.opcode, §44. 



owner: coroutine *, §44. 
print^controLblock: static 
void { ), §46. 

print^coroutine^id: static 
void ( ), §25. 

printf: int (), <stdio.h>. 
queuelist: static coroutine 

*0, §35. 

ring^size: int, §29. 



schedule: static void (), §28. 

sentinel: coroutine, §36. 

stage: int, §23. 

state: int, §44. 

vanish = 98, §129. 

verbose: int, §4. 

x: specnode, §44. 

y: spec, §44. 

z: spec, §44. 
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128. (Initialize everything 22 ) += 
mem Jocker .name = "Locker"; 
memJocker .ctl = &ivanish.ctl; 
mem Jocker .stage = vanish-, 

Dlocker.name = "Dlocker"; 

Dlocker .ctl = &ivanish.ctT, 

Dlocker .stage = vanish-, 
vanish.ctl .go . 0.1 — 4; 

for (i = 0; j < DTcache^ports-, j++) DTcache^reader[j].ctl = Ezvanish.ctT, 
if (Dcache) 

for {j = 0; j < Dcache^ports-, j++) Dcache-*reader[j].ctl = &ivanish.ctl; 
for (i = 0; j < ITcache-'ports-, j++) IT cache-dreader [j]. ctl = &ivanish.ctl; 
if {I cache) 

for (j = 0; j < I cache-ports-, j++) I cache-reader [j]. ctl = &cvanish.ctT, 

129. Here is a list of the stage numbers for special coroutines to be defined below. 
( Header definitions 6 ) += 

^define max.stage 99 /* exceeds all stage numbers */ 

^define vanish 98 /* special coroutine that just goes away * / 

^define flushJojmem 97 /* coroutine for flushing from a cache to memory */ 

^define flush JoS 96 /* coroutine for flushing from a cache to the S-cache */ 

^define filLfromjmem 95 /* coroutine for hlling a cache from memory */ 

^define filLfrom^S 94 /* coroutine for filling a cache from the S-cache */ 

^define filLfrom.virt 93 /* coroutine for filling a translation cache */ 

^define write.fromjwbuf 92 /* coroutine for emptying the write buffer * / 

^define cleanup 91 /* coroutine for cleaning the caches */ 

130. At the very beginning of stage 1, a functional unit will stall if necessary until 
its operands are available. As soon as the operands are all present, the state is set 
nonzero and execution proper begins. 

( Simulate the first stage of an execution pipeline 130 ) = 
switchl -. switch (data-state) { 

case 0: (Wait for input data if necessary; set state = 1 if it’s there l3l); 

case 1: (Begin execution of an operation 132 ); 

case 2: (Pass data to the next stage of the pipeline 134); 

case 3: (Finish execution of an operation 144); 

( Special cases for states in the first stage 266 ) ; 

} 

This code is used in section 125. 
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131. If some of our input data has been computed by another coroutine on the 
current cycle, we grab it now but wait for the next cycle. (An actual machine wouldn’t 
have latched the data until then.) 

( Wait for input data if necessary; set state = 1 if it’s there I3l ) = 
j = 0; 

if (data-y.p) { 

J++; 

if {data->y .p-'known) data^.o = data->y.p-‘o, data-<y.p = A; 
else j -\-= 10; 

} 

if {data-’z.p) { 

i++; 

if [data^z .p->known) data-'z.o = data-'z.p^o, data-'z.p = A; 
else j -\-= 10; 

} 

if {data^b.p) { 

if (data-need-b) j++; 

if (data^b.p-^known) data-’b.o = data-’b.pr^o, data-’b.p = A; 
else if (data^need.b) j +— 10; 

} 

if (data-ra.p) { 

if (data-need-ra) j++; 

if {data-^ra .pr^known) data^ra.o = data^ra.pr’o, data^ra.p = A-, 
else if {data-’need.ra) j += 10; 

} 

if (j < 10) data^state = 1; 

if [j) wait{l)\ /* otherwise we fall through to case 1 */ 

This code is used in section 130. 



b: spec, §44. 
ctl: control *, §23. 
data: register control *, 
§124. 

Dcache: cache *, §168. 
Dlocker: coroutine, §127. 
DTcache: cache §168. 
go: specnode, §44. 
Icache: cache *, §168. 
ITcache: cache *, §168. 



j: register int, §10. 
j: register int, §12. 
known: bool, §40. 

1: tetra, §17. 

memdocker : coroutine, §127. 

name: char *, §23. 

need^b: bool, §44. 

need^ra: bool, §44. 

o: octa, §40. 

p: specnode *, §40. 



ports: int, §167. 

ra: spec, §44. 

reader: coroutine *, §167. 

stage: int, §23. 

state: int, §44. 

vanish.ctl: control, §127. 

= macro (), §125. 
y: spec, §44. 
spec, §44. 
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132. Simple register-to-register instructions like ADD are assumed to take just one 
cycle, but others like FADD almost certainly require more time. This simulator can be 
configured so that FADD might take, say, four pipeline stages of one cycle each (1 + 1 + 
1 + 1), or two pipeline stages of two cycles each (2 + 2), or a single unpipelined stage 
lasting four cycles (4), etc. In any case the simulator computes the results now, for 
simplicity, placing them in data-KC and possibly also in data-'a and/or data^interrupt . 
The results will not be officially made known until the proper time. 

( Begin execution of an operation 132 ) = 
switch [data-d) { 

( Cases to compute the results of register-to-register operation 137 ) ; 

( Cases to compute the virtual address of a memory operation 265 }; 

(Cases for stage 1 execution 155 ); 

} 

(Set things up so that the results become known when they should 133 ); 

This code is used in section 130. 

133. If the internal opcode data-i is max_pipe-op or less, a special pipeline sequence 
like 1 + 1 + 1 + 1 or 2 + 2 or 15 + 10, etc., has been configured. Otherwise we assume 
that the pipeline sequence is simply 1. 

Suppose the pipeline sequence is + ^2 + • • ■ + tfc- Each tj is positive and less 
than 256, so we represent the sequence as a string pipeseq[data^] of unsigned “char- 
acters,” terminated by 0. Given such a string, we want to do the following: Wait 
(ti — 1) cycles and pass data to stage 2; wait O cycles and pass data to stage 3; . . . ; 
wait tk-i cycles and pass data to stage k; wait tk cycles and make the results known. 

The value of denin is added to t\; the value of denout is added to tk- 
{ Set things up so that the results become known when they should 133 ) = 
data^state = 3; 

if {data-*i < maxjpipe.op) { register unsigned char *s = pipe-seq[data-d\\ 
j = s[0] + data^denin; 

if (s[l]) data^state = 2; /* more than one stage */ 

else j += data- denout; 
if (j > 1) wait{j - 1); 

} 

goto switchl ; 

This code is used in section 132. 

134. When we’re in stage j, the coroutine for stage j + 1 of the same functional 
unit is self + 1. 

(Pass data to the next stage of the pipeline 134 ) = 

pass-data: if {{self + l)-next) wait{l); /* stall if the next stage is occupied */ 

{ register unsigned char *s = pipeseq[data-i]; 

j = s[self -stage]; 

if {s[self-stage + 1] = 0) j += data-denout , data-state = 3; 

/* the next stage is the last */ 
pass.after{j); 

} 

passit'. {self + l)-etl = data; 
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data- owner = self + 1; 
goto done-, 

This code is used in section 130. 

135. (Simulate later stages of an execution pipeline 135) = 

switch2-. if {data-b.p A data-b.p-known) data-b.o — data-b.p-o, data-b.p = A; 
switch (data-state) { 
case 0; panic(con/usion ("switch2" )); 
case 1; (Begin execution of a stage- two operation 351 ); 
case 2: goto pass.data; 
case 3: goto fin.ex-, 

{ Special cases for states in later stages 272 ) ; 

} 

This code is used in section 125. 

136. The default pipeline times use only one stage; they can be overridden by 
MMIX.config . The total number of stages supported by this simulator is limited to 
90, since it must never interfere with the stage numbers for special coroutines defined 
below. (The author doesn’t feel guilty about making this restriction.) 

( External variables 4 ) += 

T^define pipeJimit 90 

Extern unsigned char pipe.seq[max.pipe.op + l][pipedimit + 1]; 

137. The simplest of all register-to-register operations is set^ which occurs for 
commands like SETH as well as for commands like GETA. (We might as well start 
with the easy cases and work our way up.) 

( Cases to compute the results of register-to-register operation 137 ) = 
case set'. data-*x.o= data~*z.o', break; 

See also sections 138, 139, 140, 141, 142, 143, 343, 344, 345, 346, 348, and 350. 

This code is used in section 132. 



a: specnode, §44. 
b: spec, §44. 

confusion = macro (), §13. 
ctl: control *, §23. 
data: register control *, 
§124. 

denin: int, §44. 
denout: int, §44. 
done: label, §125. 

Extern = macro, §4. 
fin^ex: label, §144. 



i: internal.opcode, §44. 
interrupt: unsigned int, §44. 
j: register int, §12. 
known: bool, §40. 
max.pipe.op =feps, §49. 
MMIX^config: void (), 
MMIX-CONFIG §38. 
next: coroutine *, §23. 
o: octa, §40. 

owner: coroutine *, §44. 
p: specnode *, §40. 



panic =macro (), §13. 
pass.a^er = macro (), §125. 
self: register coroutine *, 
§124. 

set = 33, §49. 
stage: int, §23. 
state: int, §44. 
switchl : label, §130. 
n)ai£= macro (), §125. 
x: specnode, §44. 
z: spec, §44. 
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138 . Here are the basic boolean operations, which account for 24 of MMIX’s 256 
opcodes. 

( Cases to compute the results of register-to-register operation 137 ) += 
case or: data->x.o.h = data-y.o.h \ data-z.o.h-, 
data^x.o.l = data-*y.o.l \ data^z.o.t, 
break; 

case orn: data-*x.o.h = data-^.o.h \ ^data->z.o.h\ 
data-x.o.l = data-y.o.l \ '^data-z.o.t, 

break; 

case nor: data-x.o.h = ^{data-y.o.h \ data-z.o.h)-, 
data^x.o.l = '^{data^y.o.l \ data->z.o.l)\ 

break; 

case and: data-x.o.h = data-^.o.h Ez data^z.o.h\ 
data-x.o.l = data-*y.o.l & data->z.o.l\ 

break; 

case andn: data-x.o.h = data-*y.o.h Ez ^data->z.o.h\ 
data^x.o.l = data-*y.o.l & ^data^z.o.b, 

break; 

case nand: data^x.o.h = '^{data^.o.h & data^z.o.h); 
data-x.o.l = '^{data-yj.o.l & data-z.o.l)-, 

break; 

case xor: data-x.o.h — data-^.o.h® data-z.o.h-, 
data-x.o.l = data-*y.o.l 0 data^z.o.t, 

break; 

case nxor: data-x.o.h = data-y.o.h® ^data-z.o.h\ 
data-x.o.l = data-y.o.l © ^data-z.o.l\ 

break; 

139 . The implementation of ADDU is only slightly more difficult. It would be trivial 
except for the fact that internal opcode addu is used not only for the ADDU [I] and 
INC[M] [H,L] operations, in which we simply want to add data-y.o to data-z.o, but 
also for operations like 4ADDU. 

( Cases to compute the results of register-to-register operation 137 ) += 
case addu: data-x.o — oplus((data-op & *f8) = *28 ? 

shiftjeft (data-y.o, 1 + ((data-op 1) & *3)) : data-y.o, data-z.o)-, 

break; 

case subu: data-x.o = ominus(data-y.o, data-z.o)-, break; 

140 . Signed addition and subtraction produce the same results as their unsigned 
counterparts, but overflow must also be detected. Overflow occurs when adding y to z 
if and only if y and z have the same sign but their sum has a different sign. Overflow 
occurs in the calculation x = y — z it and only if it occurs in the calculation y = x + z. 
{ Cases to compute the results of register-to-register operation 137 ) += 

case add: data-x.o = oplus (data-y.o, data-z.o)-, 

if (((data-y.o.h © data-z.o.h) & sign.bit) = 0 A ((data-y.o.h © data-x.o.h) & sign.bit) ^ 0) 
data-interrupt 1=V_BIT; 
break; 
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case sub: data-’x.o = ominus{data-^.o, data-^z.o)-, 

if {{{data-*x.o.h 0 data->z.o.h) & sign.bit) = 0 A {{data->y.o.h 0 data-^x.o.h) & sign.bit) ^ 0) 
data^interrupt \ — V_BIT; 
break ; 

141. The shift commands might take more than one cycle, or they might even be 
pipelined, if the default value of pipeseq[sh] is changed. But we compute shifts all at 
once here, because other parts of the simulator will take care of the pipeline timing. 
(Notice that shlu is changed to sh, for this reason. Similar changes to the internal op 
codes are made for other operators below.) 

:^define shift.amt {data^z.o.h V data-’z.o.l > 64 ? 64 : data^z.o.l) 

{ Cases to compute the results of register-to-register operation 137 ) += 
case shlu: data-x-o = shift Jeft(data-qj.o, shift.amt); dataH = sh\ break; 
case shl: data->x.o = shiftJeft{data^.o, shifCamt)- data^ = sh; 

{ octa tmpo; 

tmpo = shift jright{data^x.o, shift.amt ,0); 

if {tmpo.h data-y.o.hV tmpo. I ^ data-y.o.l) datamnterrupt |= V_BIT; 

} break; 

case shru: data-’x.o = shifEright{datam/.o, shift.amt,!); data-d = sh; break; 
case shr: data^x.o = shiftjright(data-^.o, shifEamt,Q); datam = sh; break; 

142. The MUX operation has three operands, namely data^, data^z, and data^b; 
the third operand is the current (speculative) value of rM, the special mask register. 
Otherwise MUX is unexceptional. 

( Cases to compute the results of register-to-register operation 137 ) += 
case mux: data-x.o.h = {datamj.o.h & data^b.o.h) + {data^z.o.h & ^data^b.o.h); 
data-x.o.l — {data^.o.l & data->b.o.l) + {data->z.o.l & ^data^b.o.l); 

break; 



add = 29, §49. 
addu = 30, §49. 
and = 37, §49. 
andn = 38, §49. 
b: spec, §44. 

data: register control *, 
§124. 

h: tetra, §17. 

i: internal_opcode, §44. 

interrupt: unsigned int, §44. 

1: tetra, §17. 

mux = 11, §49. 

nand = 39, §49. 

nor = 36, §49. 



nxor = 41, §49. 
o: octa, §40. 
octa = struct, §17. 
ominus: octa (), 

MMIX-ARITH §5. 
op: mmix_opcode, §44. 
oplus: octa (), MMIX-ARITH §5. 
or = 34, §49. 
orn = 35, §49. 

pipe^seq: unsigned char [][], 
§136. 

sh = 10, §49. 
shift Jeft: octa (), 

MMIX-ARITH §7. 



shift.right: octa {), 
MMIX-ARITH §7. 
shl = 44, §49. 
shlu =42, §49. 
shr = 45, §49. 
shru = 43, §49. 
sign.bit = macro, §80. 
sii6 =31, §49. 
subu = 32, §49. 

V_BIT = 1 < 14, §54. 
x: specnode, §44. 
xor = 40, §49. 
y: spec, §44. 
z: spec, §44. 
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143. Comparisons are a breeze. 

( Cases to compute the results of register-to-register operation 137 ) += 
case cmp: if ((data^.o.h &: sign.bit) > {data-z.o.h sign.bit)) goto cmpjneg\ 
if {{data-gj.o.h sign.bit) < {data-z.o.h Ez sign.bit)) goto cmp.pos-, 
case cmpu: if {data^y.o.h < data^z.o.h) goto cmp.neg-, 
if {data^.o.h > data^z.o.h) goto cmp.pos-, 
if {data^.o.l < data^z.o.l) goto cmp.neg; 
if {data-y.o.l > data~>z.o.l) goto cmp.pos; 
cmp.zero: break; /* data-x is zero */ 
cmpjpos: data->x.o.l = 1; break; /* data->x.o.h is zero */ 
cmp.neg: data-x.o = neg.one\ break; 

144. The other operations will be deferred until later, now that we understand the 
basic ideas. But one more piece of code ought to be written before we move on, 
because it completes the execution stage for the simple cases already considered. 

The ren_x and ren_a fields tell us whether the x and/or a fields contain valid 
information that should become officially known. 

( Finish execution of an operation 144 ) = 
fin.ex: if (data-^reri-x) data-x. known = true', 
else if {data-mem.x) { 
data-x. known = true', 

if {-^{data-x. addr .h El* ffflOQOO)) data-x.addr .1 Ei= —8', 

} 

if (data-ren.a) data-a. known = true', 
if {data-loc .h El sign.bit) data-ra.o.l = 0; 

/* no trips enabled for the operating system */ 
if {data-interrupt Ei *lfff) (Handle interrupt at end of execution stage 307); 
die: data- owner = A; goto terminate', /* this coroutine now fades away */ 

This code is used in section 130. 
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145. The commission/deissue stage. Control blocks leave the reorder buffer 
either at the hot end (when they’re committed) or at the cool end (when they’re 
deissued). We hope most of them are committed, but from time to time our spec- 
ulation is incorrect and we must deissue a sequence of instructions that prove to 
be unwanted. Deissuing must take priority over committing, because the dispatcher 
cannot do anything until the machine’s cool state has stabilized. 

Deissuing changes the cool state by undoing the most recently issued instructions, 
in reverse order. Committing changes the hot state by doing the least recently 
issued instructions, in their original order. Both operations are similar, so we assume 
that they take the same time; at most commit_max instructions are deissued and/or 
committed on each clock cycle. 

( Deissue the coolest instruction 145 ) = 

{ 

cool = {cool = reorder.top ? reorder J)ot : cool -\- 1); 
if {verbose & issue J)it) { 

pnnt/("Deissuingu" ); print.controLblock {cool); 

if {cool-'owner) { printf {"u"); prinEcoroutineAd{cool-> owner); } 

printf {"\n"); 

} 

if {cool^renjt) renamejregs ++ , specjrem{Ezcool-ec); 
if {cool^ren.a) renamejregs++ , specjrem{Ezcool^a); 
if {cool^memjc) mem^slots++ , specjrem{&ccool~'x); 
if {cool^setJ) spec-rem{&icool^rl); 
if {cool^ owner) { 

if {cool-owner^lockloc) *{cool-*owner-’lockloc) — A, cool-’owner-lockloc = A; 
if {cool-’owner-next) unschedule {cool-'owner); 

} 

cool-0 = cool-cur-0; coolS = cool-curS ; 
deissues — ; 

} 

This code is used in section 67. 



a: specnode, §44. 
addr: octa, §40. 
cmp = 46, §49. 
cmpu = 47, §49. 
commit^max: int, §59. 
cool: control *, §60. 
cooLO: octa, §98. 
cooLS: octa, §98. 
cur.O: octa, §44. 
cur.S: octa, §44. 
data: register control *, 
§124. 

deissues: int, §60. 
h: tetra, §17. 

interrupt: unsigned int, §44. 
issuej)it = 1 0, §8. 

known: bool, §40. 



1: tetra, §17. 

loc: octa, §44. 

lockloc: coroutine §23. 

mem.slots: int, §86. 

mem.x: bool, §44. 

neg^one: octa, mmix-ARITH §4. 

next: coroutine *, §23. 

o: octa, §40. 

owner: coroutine *, §44. 
print^controLblock: static 
void ( ), §46. 

print^coroutine^id: static 
void ( ), §25. 

printf: int (), <stdio.h>. 
ra: spec, §44. 
ren.a: bool, §44. 



ren^x: bool, §44. 
rename.regs: int, §86. 
reorder.bot: control *, §60. 
reorder.top: control *, §60. 
rl: specnode, §44. 
setJ: bool, §44. 
sign.bit = macro, §80. 
spec^rem: static void (), §97. 
terminate: label, §125. 
true = 1, §11. 

unschedule: static void (), 
§33. 

verbose: int, §4. 
x: specnode, §44. 
y: spec, §44. 
spec, §44. 
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146. (Commit the hottest instruction, or break if it’s not ready 146) = 

{ 

if (nullifying) ( Nullify the hottest instruction 147) 
else { 

if (hot-n = get A hot~>zz = rQ) new.Q — oandn(g[rQ].o, hot->x.o)\ 

else if (hot-n = put A hot-xx = rQ) hot-x.o.h |= new.Q .h, hot-x.o.l |= new-Q.l\ 

if (hot-memjx) (Commit to memory if possible, otherwise break 256); 

if (hot^stack.alert) stack.overflow = true\ 

else if (stack.overflow A -ihot-*interim) { 

g[rQ].o.l 1= STACK_0VERFL0W, new.Q .1 \= STACK_0VERFL0W, stack.overflow = false; 
if (verbose & issue.bit) { 

print/ ("usettingurQ=" ); print.octa(g[rQ].o); printf ("\n"); 

} 

} 

if (verbose & issue J>it) { 

prmt/("Coiranittingu" ); print.controLblock(hot); printf ("\n"); 

} 

if (hot^reu-x) renamejregs++ ,hot^x.up^o = hot-^.o, specjrem(Ez(hot^x)); 
if (hot^reu-a) rename_regs++ , hot^a.up^o = hot^a.o, spec.rem(&i(hot->a)); 
if (hot^setJ) hot~*rl.up^o = hot^rl.o, spec-rem(&z(hot^rl)); 
if (hot^ arith.exe) g[rA\.o.l |= hot-'arith.exc; 
if (hot^usage) { 

g\rU].o.l++; if (g\rU].o.l = Q) { 

g[rU].o.h++; if ((p[rt/].o./i & *7ff f ) = 0) g[rU].o.h —= *SQ0Q; 

} 

} 

} 

if (hot-interrupt > H_BIT) (Begin an interruption and break 317); 

} 

This code is used in section 67. 

147. A load or store instruction is “nullified” if it is about to be captured by a 
trap interrupt. In such cases it will be the only item in the reorder buffer; thus 
nullifying is sort of a cross between deissuing and committing. (It is important to 
have stopped dispatching when nullification is necessary, because instructions such 
as incgamma and decgamma change rS, and we need to change it back when an 
unexpected interruption occurs.) 

(Nullify the hottest instruction 147) = 

{ 

if (verbose & issue Jiit) { 

prrnt/("Nullifyingu" ); prmEcontroLblock(hot); printf ("\il"); 

} 

if (hot-ren^x) renamejregs ++ , specjrem(Si:hot-x); 
if (hot-ren.a) rename-regs++, spec-rem(Szhot-a); 
if (hot-mem.x) memslots++ , spec_rem(&ihot-x); 
if (hot-set J) spec.rem(&ihot-rl); 
cooLO — hot- cur , cooLS = hot-eur.S; 
nullifying = false; 
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} 

This code is used in section 146. 

148 . Interrupt bits in rQ might be lost if they are set between a GET and a PUT. 
Therefore we don’t allow PUT to zero out bits that have become 1 since the most 
recently committed GET. 

( Global variables 20 ) += 

octa new.Q; /* when rQ increases in any bit position, so should this */ 
bool stack.overflow ; /*■ stack overflow not yet reported */ 



a: specnode, §44. 

arith.exc: unsigned int, §44. 

bool = enum, §11. 

cooLO: octa, §98. 

cooLS: octa, §98. 

cur^O: octa, §44. 

cur^S: octa, §44. 

decgamma =85, §49. 

false = 0, §11. 

q\ specnode [1, §86. 

56^ = 54, §49. 
h: tetra, §17. 

H_BIT = 1 < 16, §54. 
hot: control *, §60. 
i: internal.opcode, §44. 
incgamma = 84, §49. 
interim: bool, §44. 



interrupt: unsigned int, §44. 
issue.bit = 1 0, §8. 

1: tetra, §17. 
mem.slots: int, §86. 
mem.x: bool, §44. 
nullifying: bool, §315. 
o: octa, §40. 
oandn: octa (), 

MMIX-ARITH §25. 
octa = struct, §17. 
print.controLblock: static 
void ( ), §46. 

print^octa: static void (), §19. 
printf: int (), <stdio.h>. 
put = 55, §49. 

M = 21, §52. 
ren.a: bool, §44. 



ren^x: bool, §44. 

rename.regs: int, §86. 

rl: specnode, §44. 

rQ = 16, §52. 

rU = 17, §52. 

setJ: bool, §44. 

spec^rem: static void (), §97. 

stack^alert: bool, §44. 

STACK_DVERFLQW = 1< 7, §57. 

true = 1, §11. 

up: specnode *, §40. 

usage: bool, §44. 

verbose: int, §4. 

x: specnode, §44. 

XX : unsigned char, §44. 

2 . 2 : unsigned char, §44. 
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149. An instruction will not be committed immediately if it violates the basic secu- 
rity rule of MMIX: An instruction in a nonnegative location should not be performed 
unless all eight of the internal interrupts have been enabled in the interrupt mask 
register rK. Conversely, an instruction in a negative location should not be performed 
if the P_BIT is enabled in rK. 

Such instructions take one extra cycle before they are committed. The nonnegative- 
location case turns on the S_BIT of both rK and rQ, leading to an immediate interrupt 
(unless the current instruction is trap, put, or resume). 

{ Check for security violation, break if so 149 ) = 

{ 

if (hot^loc.h sign.bit) { 

if ((g[rK].o.h &i’P_Bn) A -'(hot-'interrupt &zP_BlT)) { 
hot-interrupt |=P_BIT; 
g[rQ].o.h |= P_BIT; 
new-Q.h |= P_BIT; 
if {verbose & issue.bit) { 

pnnt/ ("usettingurQ=" ); print.octa{g[rQ].o)-, printf {"\n"); 

} 

break; 

} 

} else if ((<;[rA].o./i & *ff) A *ff ^ S_BIT)) { 

hot-interrupt |= S_BIT; 
g[rQ].o.h |= S_BIT; 
new.Q.h \= S_BIT; 
g[rK].o.h |= S_BIT; 
if {verbose & issue J>it) { 

prmt/("usettingurQ=" ); prinEocta{g[rQ\.o)\ 
print/ (", urK= ") ; prinEocta{g[rK\.o)\ printf {"\n")-, 

} 

break; 

} 

} 

This code is used in section 67. 



221 



MMIX-PIPE: BRANCH PREDICTION 



150. Branch prediction. An MMIX programmer distinguishes statically between 
“branches” and “probable branches,” but many modern computers attempt to do 
better by implementing dynamic branch prediction. (See, for example, section 4.3 
of Hennessy and Patterson’s Computer Architecture, second edition.) Experience 
has shown that dynamic branch prediction can significantly improve the performance 
of speculative execution, by reducing the number of instructions that need to be 
deissued. 

This simulator has an optional bpAable containing 2“+^+”^ entries of n bits each, 
where n is between 1 and 8. Usually n is 1 or 2 in practice, but 8 bits are allocated 
per entry for convenience in this program. The bpAable is consulted and updated on 
every branch instruction (every B or PB instruction, but not JMP), for advice on past 
history of similar situations. It is indexed by the a least significant bits of the address 
of the instruction, the b most recent bits of global branch history, and the next c bits 
of both address and history (exclusive-ored). 

A bp-table entry begins at zero and is regarded as a signed n-bit number. If it 
is nonnegative, we will follow the prediction in the instruction, namely to predict 
a branch taken only in the PB case. If it is negative, we will predict the opposite 
of the instruction’s recommendation. The n-bit number is increased (if possible) if 
the instruction’s prediction was correct, decreased (if possible) if the instruction’s 
prediction was incorrect. 

(Incidentally, a large value of n is not necessarily a good idea. For example, if 
n = 8 the machine might need 128 steps to recognize that a branch taken the first 
150 times is not taken the next 150 times. And if we modify the update criteria to 
avoid this problem, we obtain a scheme that is rarely better than a simple scheme 
with smaller n.) 

The values a, b, c, and n in this discussion are called bp-a, bp_b, bp_c, and 6p_n in 
the program. 

( External variables 4 } -|-= 

Extern int bp.a, bp-b, bp.c, bp.n; /* parameters for branch prediction */ 

Extern char *bp.table; /*■ either A or an array of 2“+^+'’ items */ 



Extern = macro, §4. 

g-. specnode [], §86. 

h: tetra, §17. 

hot: control *, §60. 

interrupt: unsigned int, §44. 

issuej)it = 1 <C 0, §8. 

loc: octa, §44. 



new.Q: octa, §148. rK = 15, §52. 

o: octa, §40. rQ = 16, §52. 

P_BIT = 1 < 0, §54. S_BIT = 1 < 1, §54. 

prinCocta: static void (), §19. sign.bit = macro, §80. 
printf: int (), <stdio.h>. trap = S2, §49. 

put = 55, §49. verbose: int, §4. 

resume = 76, §49. 
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151. Branch prediction is made when we are either about to issue an instruction or 
peeking ahead. We look at the bpJ,able, but we don’t want to update it yet. 

( Predict a branch outcome I5l ) = 

{ 

predicted = op *10; /* start with the instruction’s recommendation */ 

if (bp.table) { register int h\ 

m = {{head^loc.l & bp.cmask) bpj>) + (head-^loc.l & bp.amask)-, 
m = {{cooLhist & bpJjcmask) <C bp. a) © (m ^ 2); 
h — bp.table [m] ; 

if {h bp.npower) predicted ©= *10; 

} 

if {predicted) peek.hist = {peekjiist <C 1) + 1; 
else peekJiist ^=1; 

} 

This code is used in section 85. 

152. We update the bp.table when an instruction is issued. And we store the 
opposite table value in cool^x.o.l, just in case our prediction turns out to be wrong. 

( Record the result of branch prediction 152 ) = 

if (bp.table) { register int reversed, h, h.up, h.down\ 

reversed = op &*10; 

if {peekjiist & 1) reversed ©= *10; 

m = {{headJoe .1 & bp.cmask) <C bp.b) + {headJoc.l & bp.amask)-, 
m = {{cooLhist & bp.bcmask) <C bp.a) © (m ^ 2); 
h = bpjable [m ] ; 

h.up = {h+1) &i bp.nmask', if {h.up = bp.npower) h.up — h; 

if {h = bpjnpower) h.down = h\ else h.down = (ft — 1) & bp.nmask-, 

if {reversed) { 

bpjable[m] = h.down, eool-x.o.l = h.up\ 

eool-d = pbr + br — cool-d; /* reverse the sense */ 

bpjrev.stat ++ ; 

} else { 

bpjable[m] = h.up, cool-x.o.l = h.dowrr, /* go with the flow */ 
bp.ok.stat ++ ; 

} 

if {verbose & show.pred.bit) { 

print/ ("upr edict ingu" ); prinLocta{cool->loc)\ 
print/ ("u’/oS ;ubp [°/,x] =°/,d\n" , reversed ? "NG" : "0K",m, 
bpjable[m] — {{bpjable[m\ & bp.npower) <C 1)); 

} 

cool-x.o.h — m; 

} 

This code is used in section 75. 



223 



MMIX-PIPE: BRANCH PREDICTION 



153. The calculations in the previous sections need several precomputed constants, 
depending on the parameters a, b, c, and n. 

( Initialize everything 22 ) += 

bp.amask = ((1 bp.a) — 1) ^ 2; /* least a bits of instruction address */ 

bp.cmask = ((1 ^ bp-c) — 1) <C (6p_a + 2); /* the next c address bits */ 

bpj)cmask = (1 ^ (bpj) + bp-c)) — 1; /* least b + c bits of history info */ 

bpjnmask = (1 <C bpjn) — 1; /* least significant n bits */ 

bp.npower = 1 <C {hp.n — 1); /* 2"“'^, the sign bit of an n-bit number */ 

154. ( Global variables 20 ) += 

int bp.amask, bp^cmask, bpj>cmask, bpjnmask , bpjnpower\ 

int bp.rev.stat , bp.ok.stat; /* how often we overrode and agreed */ 

int bp.bad.stat , bp.good.stat; /* how often we failed and succeeded */ 

155. After a branch or probable branch instruction has been issued and the value 
of the relevant register has been computed in the reorder buffer as data^b.o, we’re 
ready to determine if the prediction was correct or not. 

( Cases for stage 1 execution 155 ) = 

case hr-, case pbr: j = registerHruth{data^b.o, data-‘op)\ 
if (j) data^go .0 = data-'z.o; else data-go .0 = data-gj.o; 
if (j = {data-d = pbr)) bp.good.stat++; 
else { /* oops, misprediction */ 

bp.bad.stat ++ ; 

( Recover from incorrect branch prediction 160 ) ; 

} 

goto fin.ex', 

See also sections 313, 325, 327, 328, 329, 331, and 356. 

This code is used in section 132. 



b: spec, §44. 
bp.a: int, §150. 
bp^b\ int, §150. 
bp^c: int, §150. 
bpjn: int, §150. 
bp^table: char *, §150. 
br = 69, §49. 
cool: control *, §60. 
cooLhist: unsigned int, §99. 
data: register control *, 
§124. 

fin.ex: label, §144. 
go: specnode, §44. 



h: tetra, §17. 
head: fetch *, §69. 
i: internal.opcode, §44. 
j: register int, §12. 

1: tetra, §17. 
loc: octa, §68. 
loc: octa, §44. 
m: register int, §12. 
o: octa, §40. 

op: register mmix.opcode, 

§75. 

op: mmix.opcode, §44. 



pbr = 70, §49. 

peekJiist: unsigned int, §99. 
predicted: register int, §85. 
print.octa: static void (), §19. 
print/: int (), <stdio.h>. 
register. truth: static int (), 
§157. 

show.pred.bit = 1 7, §8. 

verbose: int, §4. 
x: specnode, §44. 
y: spec, §44. 
spec, §44. 
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156. The register-truth subroutine is used by B, PB, CS, and ZS commands to decide 
whether an octabyte satisfies the conditions of the opcode, data-op. 

{ Internal prototypes 13 ) += 

static int register-truth ARGS((octa, mmix_opcode)); 

157. (Subroutines 14) += 
static int register-truth {o, op) 

octa o; 

mmix_opcode op; 

{ register int 6; 

switch ((op 1) & *3) { 

cased: t> = o./i 31; break; /* negative? */ 

case 1: b — {o.h = 0 A o.l = 0)-, break; /* zero? */ 

case 2\ b — {o.h < sign-bit A {o.hV o.l)); break; /* positive? */ 

case 3: b — o.l&i*l; break; /* odd? */ 

} 

if (op & *8) return 6 © 1; 
else return 6; 

} 

158. The issued-between subroutine determines how many speculative instructions 
were issued between a given control block in the reorder buffer and the current cool 
pointer, when cc = cool. 

{ Internal prototypes 13 ) += 

static int issued-between ARGS((control *, control *)); 

159. (Subroutines 14) += 
static int issued-between {c, cc) 

control *c, *cc; 

{ 

if (c > cc) return c — 1 — cc; 

return (c — reorder-bot) + {reorder-top — cc); 

} 

160. If more than one functional unit is able to process branch instructions and if 
two of them simultaneously discover misprediction, or if misprediction is detected by 
one unit just as another unit is generating an interrupt, we assume that an arbitration 
takes place so that only the hottest one actually deissues the cooler instructions. 

Changes to the bp-table aren’t undone when they were made on speculation in an 
instruction being deissued; nor do we worry about cases where the same bp-table 
entry is being updated by two or more active coroutines. After all, the bp-table is 
just a heuristic, not part of the real computation. We correct the bp-table only if we 
discover that a prediction was wrong, so that we will be less likely to make the same 
mistake later. 

( Recover from incorrect branch prediction 160 ) = 
i = issued-between ( data , cool ) ; 
if {i < deissues) goto die; 
deissues = i; 

old-tail = tail = head; resuming — 0; 



/* clear the fetch buffer */ 
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(Restart the fetch coroutine 287); 
msCptr.o = data-'go.o, instjptr.p = A; 
if {-^{data^loc.h &i sign.bit)) { 

if {instjptr .o.h Ez signj)it) data-’interrupt |=P_BIT; 
else data-’interrupt &i= 

} 

if (bp.table) { 

bp.table[data-‘x.o.h] = data->x.o.l\ /* this is what we should have stored */ 
if (verbose & show-predMt) { 

pnnt/("umispredictedu" ); print.octa(data-’loc); 

print/ (" ;ubp [°/,x] =°/,d\n" , data^x.o.h, data~*x.o.l — {{data-x.o.l & bpjnpower) <C 1)); 

} 

} 

cooLhist = (j ? (data-‘hist <C 1) + 1 : data-hist 1); 

This code is used in section 155. 

161 . ( External prototypes 9 } += 

Extern void prinCstats ARGS((void)); 

162 . ( External routines 10 )+= 
void prinCstats ( ) 

{ 

register int j\ 

if (bp-table) 

print/ ("Predict ions : u°/oduinu agreement ,u’/.duinuopposition;u’/dugood,u°/idubad\n" , 
bp.ok.stat , bp.rev-stat , bp.good.stat , bp.bad.stat)-, 
else print/("Predictions : u°/odugood,u’/.dubad\n" , bp^good.stat , bp.bad.stat); 
pnnt/ ("Instruct ionsuissueduperucycle : \n" ); 

for (j = 0; j < dispatch.max ; /++) print/ ("uu°/«duuu"/od\n" dzspotc/i.stot [/]); 

} 



ARCS = macro, §6. 
bp.bad.stat: int, §154. 
bp.good.stat: int, §154. 
bpjnpower: int, §154. 
bp.ok.stat: int, §154. 
bpjrev.stat: int, §154. 
bp^table: char *, §150. 
control = struct, §44. 
cool: control *, §60. 
cooLhist: unsigned int, §99. 
data: register control *, 
§124. 

deissues: int, §60. 
die: label, §144. 
dispatch^max: int, §59. 



dispatch.stat: int *, §66. 
Extern = macro, §4. 
go: specnode, §44. 
h: tetra, §17. 
head: fetch *, §69. 
hist: unsigned int, §44. 
i: register int, §12. 
insLptr: spec, §284. 
interrupt: unsigned int, §44. 
j: register int, §12. 

1 : tetra, §17. 
loc: octa, §44. 

mmix_opcode = enum, §47. 
o: octa, §40. 
octa = struct, §17. 



old.tail: fetch *, §70. 
op: mmix_opcode, §44. 
p: specnode *, §40. 

P_BIT = 1 < 0, §54. 
prinLocta: static void (), §19. 
printf: int (), <stdio.h>. 
reorder.bot: control *, §60. 
reorder.top: control *, §60. 
resuming: int, §78. 
show.pred.bit = 1 7, §8. 

sign.bit = macro, §80. 
tail: fetch §69. 
verbose: int, §4. 
x: specnode, §44. 
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163. Cache memory. It’s time now to consider MMIX’s MMU, the memory man- 
agement unit. This part of the machine deals with the critical problem of getting data 
to and from the computational units. In a RISC architecture all interaction between 
main memory and the computer registers is specified by load and store instructions; 
thus memory accesses are much easier to deal with than they would be on a machine 
with more complex kinds of interaction. But memory management is still difficult, 
if we want to do it well, because main memory typically operates at a much slower 
speed than the registers do. High-speed implementations of MMIX introduce interme- 
diate “caches” of storage in order to keep the most important data accessible, and 
cache maintenance can be complicated when all the details are taken into account. 
(See, for example. Chapter 5 of Hennessy and Patterson’s Computer Architecture, 
second edition.) 

This simulator can be configured to have up to three auxiliary caches between 
registers and memory: An I-cache for instructions, a D-cache for data, and an S- 
cache for both instructions and data. The S-cache, also called a secondary cache, is 
supported only if both I-cache and D-cache are present. Arbitrary access times for 
each cache can be specified independently; we might assume, for example, that data 
items in the I-cache or D-cache can be sent to a register in one or two clock cycles, 
but the access time for the S-cache might be say 5 cycles, and main memory might 
require 20 cycles or more. Our speculative pipeline can have many functional units 
handling load and store instructions, but only one load or store instruction can be 
updating the D-cache or S-cache or main memory at a time. (However, the D-cache 
can have several read ports; furthermore, data might be passing between the S-cache 
and memory while other data is passing between the reorder buffer and the D-cache.) 

Besides the optional I-cache, D-cache, and S-cache, there are required caches called 
the IT-cache and DT-cache, for translation of virtual addresses to physical addresses. 
A translation cache is often called a “translation lookaside buffer” or TLB; but we 
call it a cache since it is implemented in nearly the same way as an I-cache. 

164. Consider a cache that has blocks of 2^ bytes each and associativity 2“; here 
6 > 3 and a > 0. The I-cache, D-cache, and S-cache are addressed by 48-bit physical 
addresses, as if they were part of main memory; but the IT and DT caches are 
addressed by 64-bit keys, obtained from a virtual address by blanking out the lower 
s bits and inserting the value of n, where the page size s and the process number n 
are found in rV. We will consider all caches to be addressed by 64-bit keys, so that 
both cases are handled with the same basic methods. 

Given a 64-bit key, we ignore the low-order b bits and use the next c bits to address 
the cache set; then the remaining 64 — 6 — c bits should match one of 2“ tags in that 
set. The case a = 0 corresponds to a so-called direct-mapped cache; the case c = 0 
corresponds to a so-called fully associative cache. With 2'” sets of 2“ blocks each, and 
2^ bytes per block, the cache contains 2“+^+° bytes of data, in addition to the space 
needed for tags. Translation caches have b = 3 and they also usually have c = 0. 

If a tag matches the specified bits, we “hit” in the cache and can use and/or 
update the data found there. Otherwise we “miss,” and we probably want to replace 
one of the cache blocks by the block containing the item sought. The item chosen 
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for replacement is called a victim. The choice of victim is forced when the cache is 
direct-mapped, but four strategies for victim selection are available when we must 
choose from among 2“ entries for a > 0; 

• “Random” selection chooses the victim by extracting the least significant a bits of 
the clock. 

• “Serial” selection chooses 0, 1, . . . , 2“ — 1, 0, 1, . . . , 2“ — 1, 0, ... on successive 
trials. 

• “LRU (Least Recently Used)” selection chooses the victim that ranks last if items 
are ranked inversely to the time that has elapsed since their previous use. 

• “Pseudo-LRU” selection chooses the victim by a rough approximation to LRU that 
is simpler to implement in hardware. It requires a bit table ri . . .r2<i_i. Whenever 
we use an item with binary address (i\ . . .ia )2 in the set, we adjust the bit table as 
follows: 

here the subscripts on r are binary numbers. (For example, when a = 3, the use of 
element (010)2 sets ri ^ 1, rio ^ 0, rioi ^ 1, where rioi means the same as r^.) 
To select a victim, we start with I ^ 1 and then repeatedly set I ^ 21 + ri, a times; 
then we choose element / — 2“. When a = 1, this scheme is equivalent to LRU. When 
0 = 2, this scheme was implemented in the Intel 80486 chip. 

( Type definitions ll ) -|-= 
typedef enum { 

random, serial , pseudo Jru , Iru 
} replace_policy ; 

165. A cache might also include a “victim” area, which contains the last 2" victim 
blocks removed from the main cache area. The victim area can be searched in parallel 
with the specified cache set, thereby increasing the chance of a hit without making 
the search go slower. Each of the three replacement policies can be used also in the 
victim cache. 



MMIX-PIPE: CACHE MEMORY 



228 



166. A cache also has a granularity 2®, where b > g > Z. This means that we 
maintain, for each cache block, a set of “dirty bits,” which identify the 2®-byte 
groups that have possibly changed since they were last read from memory. Thus if 
g = b, an entire cache block is either dirty or clean; if gr = 3, the dirtiness of each 
octabyte is maintained separately. 

Two policies are available when new data is written into all or part of a cache block. 
We can write-through, meaning that we send all new data to memory immediately and 
never mark anything dirty; or we can write-back, meaning that we update the memory 
from the cache only when absolutely necessary. Furthermore we can write-allocate, 
meaning that we keep the new data in the cache, even if the cache block being written 
has to be fetched first because of a miss; or we can write- around, meaning that we 
keep the new data only if it was part of an existing cache block. 

(In this discussion, “memory” is shorthand for “the next level of the memory 
hierarchy”; if there is an S-cache, the Tcache and D-cache write new data to the 
S-cache, not directly to memory. The Tcache, IT-cache, and DT-cache are read-only, 
so they do not need the facilities discussed in this section. Moreover, the D-cache and 
S-cache can be assumed to have the same granularity.) 

{ Header definitions 6 ) -|-= 

^define WRITE_BACK 1 /* use this if not write-through */ 

^define WRITE_ALLOC 2 /* use this if not write-around */ 

167. We have seen that many flavors of cache can be simulated. They are repre- 
sented by cache structures, containing arrays of cacheset structures that contain 
arrays of cacheblock structures for the individual blocks. We use a full byte to store 
each dirty bit, and we use full integer words to store rank fields for LRU processing, 
etc.; memory economy is less important than simplicity in this simulator. 

(Type definitions ll) -|-= 
typedef struct { 

octa tag-, /* bits of key not included in the cache block address */ 
char * dirty, /* array of 2®“*’ dirty bits, one per granule */ 

octa *data-, /* array of 2*'“® octabytes, the data in a cache block */ 

int rank-, /* auxiliary information for non-random policies */ 

} cacheblock; 

typedef cacheblock *cacheset; /* array of 2“ or 2” blocks */ 
typedef struct { 

int a, b, c, g, v; 

/* Ig of associativity, blocksize, setsize, granularity, and victimsize */ 
int aa, bb, cc, gg, vv; 

/* associativity, blocksize, setsize, granularity, and victimsize (all powers of 2) */ 
int tagmask-, /* —2*'"'"'^ */ 

replace_policy repl, vrepl; /* how to choose victims and victim- victims */ 

int mode-, /* optional WRITE_BACK and/or WRITE_ALLOC */ 

int accessHime; /* cycles to know if there’s a hit */ 

int copyAnHime; /* cycles to copy a new block into the cache */ 

int copy.out.time; /* cycles to copy an old block from the cache */ 

cacheset *set; /* array of 2'’ sets of arrays of cache blocks */ 

cacheset victim-, /* the victim cache, if present */ 
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/* a coroutine for copying new blocks into the cache */ 

/* its control block */ 

/* a coroutine for writing dirty old data from the cache */ 
/* its control block */ 

/* filling comes from here */ 

/* flushing goes to here */ 

nonzero when the cache is being changed significantly */ 

/* nonzero when filler should pass data back */ 
int ports', /* how many coroutines can be reading the cache? */ 
coroutine ^reader-, 

/* array of coroutines that might be reading simultaneously */ 
char *name', /* "Icache", for example */ 

} cache; 

168. (External variables 4) += 

Extern cache *Icache, *Dcache, *Scache, *ITcache, *DTcache', 



corontine filler; 
control filler.ctl; 
corontine flusher; 
control flusher.ctl; 
cacheblock inbuf; 
cacheblock outbuf; 
lockvar lock; /* 
lockvar fill Jock; 



169. Now we are ready to define some basic subroutines for cache maintenance. 
Let’s begin with a trivial routine that tests if a given cache block is dirty. 

( Internal prototypes 13 ) += 

static bool is-dirty ARGS((cache cacheblock *)); 

170. (Subroutines 14) += 
static bool is-dirty (c, p) 

cache *c; /* the cache containing it */ 

cacheblock *p; /* a cache block */ 

{ 

register int j; 
register char *d = p^dirty ; 
for {j = 0; j < c-bb; d++,j += c-‘gg) 
if (*d) return true; 
retnrn false; 

} 



ARCS = macro, §6. Extern = macro, §4. 

bool = enum, §11. false =0, §11. 

control = struct, §44. lockvar = coroutine *, §37. 

coroutine = struct, §23. octa = struct, §17. 



random =0, §164. 
replace.policy = enum, §164. 
true = 1, §11. 
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171. For diagnostic purposes we might want to display an entire cache block. 

( Internal prototypes 13 ) += 

static void print.cacheMock ARCS ((cacheblock, cache *)); 

172. (Subroutines 14 } += 
static void print.cacheMock {p, c) 

cacheblock p\ 
cache *c; 

{ register int i, j, b = c-bb 3, g = cr’gg 3> 3; 
printf ( " ’/.OSx’/.OSx : u " , p.tag ,h,p. tag .1); 
for (i = j = 0; j < b- j++,i += {{j & (g - 1)) ? 0 : 1)) 

printf {"°/,08x°/,08x7,c" ,p.data[j].h,p.data[j].l,p.dirty[i] ? : ’u’)i 

printf {"u C/A) \n" ,p.rank); 

} 

173. ( Internal prototypes 13)+= 

static void prinEcache Jocks ARCS ((cache *)); 

174. (Subroutines 14 ) += 
static void prinEcache Jocks {c) 

cache *c; 

{ 

if (c) { 

if (c-lock) print/ (""/,Sulockedubyu°/oS : ’/.d\n" , cr^name , cr^lock-'name , cr>lock^stage)\ 
if (c-fillJock) 

print/ ("7osfillulockedubyu°/oS : ”/od\n" , c^name, c->fillJock^name, fill Jock-" stage)-, 

} 

} 

175. The print_cache routine prints the entire contents of a cache. This can be a 
huge amount of data, but it can be very useful when debugging. Fortunately, the task 
of debugging favors the use of small caches, since interesting cases arise more often 
when a cache is fairly small. 

( External prototypes 9 ) += 

Extern void print.cache ARGS ((cache *,bool)); 

176. ( External routines 10)+= 
void prinEcache{c, dirty.only) 

cache *c; 
bool dirty ^only, 

{ 

if (c) { register int i, /; 

printf {"7,Suofu°/s:" , dirty.only ? "Dirtyublocks" : "Contents" , c^name); 
if {c-filler .next) { 

prini/("u(fillingu" ); 

prinEocta{c-*name[l] = ’T’ ? cr^filler^ctl .y.o : cr> filler. ctl.z.o)-, 
printf (")"); 

} 

if {c->flusher .next) { 
prini/("u(flushingu" ); 
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prmt.octa [c-outbuf . tag ) ; 
prmtf (")"); 

} 

printf 

(Print all of c’s cache blocks 177); 

} 

} 

177 . We don’t print the cache blocks that have an invalid tag, unless requested to 
be verbose. 

( Print all of c’s cache blocks 177 ) = 
for (i = 0; i < cr’cc; i++) 
for (j = 0; j < c-aa; j++) 

if {(-^{c-*set[i][j].tag .h & sign.bit) V (verbose & showjwholecacheJ)it)) A 
(~^ dirty. only V is.dirty (c, &ic-’set[i][j]))) { 
printf (" [y.d] [7.d] u" 
print.cacheJ)lock(cr‘set\i\ \j],c); 

} 

for (j = 0- j < c-w; j++) 

if ((-i(c^victim[j].tag .h & sign.bit) V (verbose & show.wholecache.bit)) A 
(-idirty.only V is.dirty(c,&LC-^victim[j\))) { 

print/ ("V [y.d] u" ,j); 

print.cache.block (c-*victim [j], c ) ; 

} 

This code is used in section 176. 



aa: int, §167. 

ARCS = macro, §6. 
bh\ int, §167. 
bool = enum, §11. 
cache = struct, §167. 
cacheblock = struct, §167. 
cc: int, §167. 
data: octa *, §167. 
dirty: char *, §167. 

Extern = macro, §4. 
filLlock: lockvar, §167. 
filler: coroutine, §167. 
filler^ctl: control, §167. 



flusher: coroutine, §167. 
gg: int, §167. 
h: tetra, §17. 

is.dirty: static bool {), §170. 

1: tetra, §17. 

lock: lockvar, §167. 

name: char *, §167. 

next: coroutine *, §23. 

o: octa, §40. 

outbuf: cacheblock, §167. 
print^octa: static void (), §19. 
printf: int (), <stdio.h>. 



rank: int, §167. 
set: cacheset *, §167. 
show.wholecache.bit = 1 8, 

§ 8 . 

sign.bit = macro, §80. 
stage: int, §23. 
tag: octa, §167. 
verbose: int, §4. 
victim: cacheset, §167. 
vv: int, §167. 
y: spec, §44. 
spec, §44. 
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178. The cleanMock routine simply initializes a given cache block. 

( External prototypes 9 ) += 

Extern void cleanMock ARGS((cache *,cacheblock *)); 

179. ( External routines lo}+= 
void cleanMock {c, p) 

cache *c; 
cacheblock *p; 

{ 

register int j; 

p^tag.h = signj)it ,p->tag .1 = 0; 

for (j = 0; j < cr'bb 3; J++) p-'datalj] = zero-Octa\ 
for (j = 0; j < c-bb > j++) p^diHy[j] = false-, 

} 

180. The zap^cache routine invalidates all tags of a given cache, effectively restoring 
it to its initial condition. 

( External prototypes 9 ) += 

Extern void zap.cache ARGS((cache *)); 

181. We clear the dirty entries here, just to be tidy, although they could actually 
be left in arbitrary condition when the tags are invalid. 

( External routines 10 ) += 
void zap.cache{c) 
cache *c; 

{ 

register int i, j; 

for (t = 0; i < cr^cc; i++) 

for (j = 0; j < c-aa-, j++) { 
cleanMock (c, &z{c-^set[i][j]))\ 

} 

for {j = 0- j < c-w; j++) { 
cleanJ)lock{c, &:{c~victim[j]))-, 

} 

} 

182. The get-reader subroutine finds the index of an available reader coroutine for 
a given cache, or returns a negative value if no readers are available. 

( Internal prototypes 13 ) += 

static int getjreader ARGS((cache *)); 

183. (Subroutines 14) += 
static int getjreader {c) 

cache *c; 

{ register int j; 

for ( j = 0 ; j < exports-, j++) 

if {c-^reader[j].next = A) return j; 
return —1; 

} 
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184. The subroutine copy-block (c,p, cc,pp) copies the dirty items from block p of 
cache c into block pp of cache cc, assuming that the destination cache has a sufficiently 
large block size. (In other words, we assume that cc^b > c^b.) We also assume that 
both blocks have compatible tags, and that both caches have the same granularity. 

( Internal prototypes 13 } += 

static void copy-block ARGS((cache *,cacheblock *, cache *,cacheblock *)); 

185. (Subroutines 14 ) += 
static void copy-block{c,p, cc, pp) 

cache *c, *cc; 
cacheblock *p, *pp\ 

{ 

register int j, jj , i, ii, lim; 
register int off = pr^tag .1 & {cc-^bb — 1); 

if {c->g 7^ cc~‘g V pr*tag.h 7^ pp-'tag.h V pr>tag.l — off 7^ pp-tag.l) 
panic ( confusion ( " copyublock" ) ) ; 
for (j = 0,jj = off c^g-, j < c-fe& > c- 5 ; j++,jj ++) 
if {p^diHy\j]) { 

pp-'dirty[jj] = true, 

for {i = j ^ {c-‘g — 3), ii = jj <C {(r<g — 3), lim = (j + 1) <C (c-g — 3); i < lim-, 
i++,ii++) pp-^datalii] = p-*data[i]; 

} 

} 



aa: int, §167. 

ARCS = macro, §6. 

b: int, §167. 

bb: int, §167. 

cache = struct, §167. 

cacheblock = struct, §167. 

cc: int, §167. 

confusion = macro ( ), §13. 

data: octa *, §167. 



dirty: char *, §167. 
Extern = macro, §4. 
false = 0, §11. 
g: int, §167. 
h: tetra, §17. 

1: tetra, §17. 
next: coroutine *, §23. 
panic = macro (), §13. 
ports: int, §167. 



reader: coroutine *, §167. 
set: cacheset *, §167. 
sign.bit = macro, §80. 
tag: octa, §167. 
true = 1, §11. 
victim: cacheset, §167. 
vv: int, §167. 
zero.octa: octa, 
MMIX-ARITH §4. 
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186. The choose-victim subroutine selects the victim to be replaced when we need 
to change a cache set. We need only one bit of the rank fields to implement the r table 
when policy = pseudoJ.ru , and we don’t need rank at all when policy = random . Of 
course we use an a-bit counter to implement policy = serial. In the other case, 
policy = Iru, we need an a-bit rank field; the least recently used entry has rank 0, 
and the most recently used entry has rank 2“ — 1 = aa — 1. 

( Internal prototypes 13 ) -|-= 

static cacheblock *choose.victim ARGS((cacheset, int, replace_policy)); 

187. (Subroutines 14} -|-= 

static cacheblock *choosejvictim[s, aa, policy) 
cacheset s; 

int aa\ /* setsize */ 

replace.policy policy, 

{ 

register cacheblock 
register int I, m; 
switch [policy) { 

case random: return &cs[ticks.l & (aa — 1)]; 

case serial: I = s[0].ranfc; s[0].ranfc = (Z -|- 1) & [aa — 1); retnrn &s[Z]; 
case Iru: 

for (p = s; p < s aa-, p++) 
if [pr^rank = 0) return p; 

pamc(con/itsion("lruuvictim" )); /* what happened? nobody has rank zero */ 

case pseudo Jru: 

for [I = l,m — aa ^ 1; m; m ~;$>= 1) I = I + I + s[l].rank; 
retnrn &cs[l — aa]-, 

} 

} 

188. The note-Usage subroutine updates the rank entries to record the fact that a 
particular block in a cache set is now being used. 

( Internal prototypes 13 ) -|-= 

static void notejasage ARCS ((cacheblock cacheset, int, replace.policy)); 

189. (Subroutines 14} -|-= 

static void notejusage[l, s, aa, policy) 

cacheblock *Z; /* a cache block that’s probably worth preserving */ 

cacheset s; /* the set that contains I */ 
int aa; /* setsize */ 
replace.policy policy-, 

{ 

register cacheblock *p; 
register int j, m, r; 
if [aa = 1 V policy < serial) return; 
if [policy = Iru) { 
r = Mrank-, 

for [p = s\ p < s + aa; p++) 
if [pr-rank > r) pr-rank — ; 
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hrank = aa — 1; 

} 

else { /* policy = pseudo Jru */ 

r = I — s\ 

for (j = 1, m = aa 3> 1; m\ m 3>= 1) 
if (r & m) s[j].ranfc = 0, j = J + i + 1; 
else s[j].rank = l,j = j + j; 

} 

return; 

} 

190 . The demote-usage subroutine is sort of the opposite of note-Usage; it changes 
the rank of a given block to least recently used. 

( Internal prototypes 13 } += 

static void demotejusage ARGS((cacheblock *, cacheset, int, replace.policy)); 

191 . (Subroutines 14) += 

static void demote jusage [I, s, aa, policy) 

cacheblock *l\ /* a cache block we probably don’t need */ 

cacheset s; /* the set that contains I */ 
int aa; /* setsize */ 
replace_policy policy, 

{ 

register cacheblock *p\ 
register int j, m, r; 
if (aa = 1 V policy < serial) return; 
if {policy = Iru) { 
r = l-rank', 

for {p — s\ p < s + aa; p++) 
if [pr^rank < r) p^rank++; 
l-rank = 0; 

} 

else { /* policy = pseudo Jru */ 

r = I — S', 

for {j = 1, m = aa 3> 1; m; m 3>= 1) 
if {r&im) s\j].rank = l,j = j + j + 1; 
else s[j].ranA: = 0,j = j + j; 

} 

return; 

} 



aa: int, §167. 

ARCS = macro, §6. 
cacheblock = struct, §167. 
cacheset = cacheblock *, 
§167. 



confusion = m&CTO (), §13. 
Iru = 3, §164. 
pamc= macro (), §13. 
pseudo^lru = 2, §164. 
random =0, §164. 



rank: int, §167. 
replace.policy = enum, §164. 
serial = 1, §164. 
ticks: Extern octa, §87. 
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192. The cachesearch routine looks for a given key a in a given cache, and returns 
a cache block if there’s a hit; otherwise it returns A. If the search hits, the set in 
which the block was found is stored in global variable hitset. Notice that we need to 
check more bits of the tag when we search in the victim area. 

^define cache.addr (c, alf) cr^set[{alf .1 & '^{cr^tagmask)) cr*b] 

{ Internal prototypes 13 ) += 

static cacheblock *cache^search ARGS((cache *,octa)); 

193. (Subroutines 14} += 

static cacheblock *cache_search{c, alf) 

cache *c; /* the cache to be searched */ 

octa alf-, /* the key */ 

{ 

register cacheset s; 
register cacheblock 

s = cache.addr{c, alf); /* the set corresponding to alf */ 
for (p = s; p < s + c-aa; p++) 

if {{{p-*tag .1 © alf .1) & cr^tagmask) = 0 Apr^tag.h = alf .h) goto hit; 
s = (y victim; 

if (-is) return A; /* cache miss, and no victim area */ 
for (p = s; p < s + crvv; p++) 

if ({{p->tag.l © alf .1) & (— c-66)) = 0 Ap^tag.h = alf .h) goto hit; 
return A; /* double miss */ 
hit: hit.set = s; return p; 

} 

194. (Global variables 20 ) += 
cacheset hiEset; 

195. If p = cachesearch{c,alf) hits and if we call use-and-fix{c,p) immediately 
afterwards, cache c is updated to record the usage of key alf . A hit in the victim area 
moves the cache block to the main area, unless the filler routine of cache c is active. 
A pointer to the (possibly moved) cache block is returned. 

( Internal prototypes 13 ) += 

static cacheblock *use^and^fix ARGS((cache *, cacheblock *)); 

196. (Subroutines 14) += 

static cacheblock *use_and.fix{c,p) 
cache *c; 
cacheblock *p; 

{ 

if (hiEset cr'victim) notejasage{p,hiEset,c-'aa,cr‘repl); 
else { 

note.usage(p,hiEset ,cr^vv ,c-*vrepl); /* found in victim cache */ 
if {-ic-filler . next ) { 

register cacheset s = cache^addr {c, p^tag); 

register cacheblock *q = choose-victim{s,cr>aa,c->repl); 

notejusage{q, s, cr*aa, cr^repl); 

{ Swap cache blocks p and q 197 ) ; 
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return q; 

} 

} 

return p; 

} 

197. We can simply permute the pointers inside the cacheblock structures of a 
cache, instead of copying the data, if we are careful not to let any of those pointers 
escape into other data structures. 

( Swap cache blocks p and q 197 } = 

{ 

octa t; 

register char *d = p^dirty ; 
register octa *dd = p~*data; 

t = p^tag ; p-'tag = q^tag ; q~‘tag = t; 
p-‘dirty = q^ dirty, q^ dirty = d; 
p^data = q^data\ q-'data = dd\ 

} 

This code is used in sections 196 and 205. 

198. The demote-and.fix routine is analogous to use-and-fix , except that we don’t 
want to promote the data we found. 

( Internal prototypes 13 } += 

static cacheblock * demote-and.fix ARGS((cache *, cacheblock *)); 

199. (Subroutines 14) += 

static cacheblock *demote-and.fix{c,p) 
cache *c; 
cacheblock *p\ 

{ 

if {hiCset ^ cr*victim) demote.usage(p,hit.set,c->aa,c-repl)\ 
else demote.usage{p, hiCset,c-vv, cr^vrepl); 

return p; 

} 



aa: int, §167. 

ARCS = macro, §6. 
b: int, §167. 
bb: int, §167. 
cache = struct, §167. 
cacheblock = struct, §167. 
cacheset = cacheblock *, 
§167. 

choose^victim: static 
cacheblock *(), §187. 
data: octa *, §167. 



demote.usage: static void {), 
§191. 

dirty: char *, §167. 
filler: coroutine, §167. 
h: tetra, §17. 

1: tetra, §17. 
next: coroutine *, §23. 
note.usage: static void (), 
§189. 

octa = struct, §17. 
p: register cacheblock *, 



§205. 

q: register cacheblock *, 

§205. 

repl: replace_policy , §167. 
set: cacheset *, §167. 
tag: octa, §167. 
tagmask: int, §167. 
victim: cacheset, §167. 
vrepl: replace_policy , §167. 
vv: int, §167. 
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200. The subroutine load^cache {c, p) is called at a moment when c-'lock has been 
set and (^inbuf has been filled with clean data to be placed in the cache block p. 

{ Internal prototypes 13 ) += 

static void load^cache ARCS ((cache *,cacheblock *)); 

201. (Subroutines 14} += 
static void load^cache{c,p) 

cache *c; cacheblock *p; 

{ 

register int i; 
register octa *d; 

for (* = 0; i < c~>bb ^ cr*g; i++) p^dzHyli] = false, 
d = p-‘data\ p-'data = c^inbuf .data \ cr*inbuf .data = d; 
p-'tag = c^inbuf .tag; 

hiEset = cache.addr {c, p-'tag); use.and.fix {c, p); /* P not moved */ 

} 

202. The subroutine flush^cache{c,p,keep) is called at a “quiet” moment when 
cr^flusher .next = A. It puts cache block p into c-outbuf and fires up the c^flusher 
coroutine, which will take care of sending the data to lower levels of the memory 
hierarchy. Cache block p is also marked clean. 

( Internal prototypes 13 ) += 

static void flush.cache ARCS ((cache *, cacheblock *,bool)); 

203. (Subroutines 14) += 
static void flush.cache{c,p, keep) 

cache *c; 

cacheblock *p; /* a block inside cache c */ 

bool keep; /* should we preserve the data in p? */ 

{ 

register octa *d; 
register char *dd; 
register int j; 

c^outbuf . tag = petag ; 

if (keep) for {j = 0; j < c~>bb ^ 3; j++) c-outbuf .data[j] = p^data[j\; 

else d = c^outbuf .data, c~>outbuf .data = pe data, je data = d; 

dd = cr*outbuf . dirty , c*outbuf . dirty = pedirty , pedirty = dd ; 

for (j = 0; j < c-bb > ceg; j++) pedirty[j] = false; 

startup {&LC^flusher ,c~*copy. out J,ime); /* will not be aborted */ 

} 

204. The allocslot routine is called when we wish to put new information into a 
cache after a cache miss. It returns a pointer to a cache block in the main area where 
the new information should be put. The tag of that cache block is invalidated; the 
calling routine should take care of filling it and giving it a valid tag in due time. The 
cache’s filler routine should not be active when allocslot is called. 

Inserting new information might also require writing old information into the next 
level of the memory hierarchy, if the block being replaced is dirty. This routine returns 
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A in such cases if the cache is flushing a previously discarded block. Otherwise it 
schedules the flusher coroutine. 

This routine returns A also if the given key happens to be in the cache. Such 
cases are rare, but the following scenario shows that they aren’t impossible: Suppose 
the DT-cache access time is 5, the D-cache access time is 1, and two processes 
simultaneously look for the same physical address. One process hits in DT-cache but 
misses in D-cache, waiting 5 cycles before trying alloc.slot in the D-cache; meanwhile 
the other process missed in D-cache but didn’t need to use the DT-cache, so it might 
have updated the D-cache. 

A key value is never negative. Therefore we can invalidate the tag in the chosen 
slot by forcing it to be negative. 

( Internal prototypes 13 ) -|-= 

static cacheblock *alloc-slot ARGS((cache *,octa)); 

205. (Subroutines 14) -|-= 

static cacheblock *aUocslot{c, alf) 
cache *c; 

octa alf- /* key that probably isn’t in the cache */ 

{ 

register cacheset s; 
register cacheblock *p, *q-, 
if {cache^search{c,alf)) return A; 

if {(^flusher .next A c~outbuf .tag .h = alf .h A -'{{c-outbuf .tag .1 0 alf .1) & c-tagmask)) 

return A; 

s = cache.addr [c, alf)-, /* the set corresponding to alf */ 
if {cr>victim) p = choose-victim{cr>victim,cr'vv ,cr>vrepl)-, 
else p = choose^victim[s, c-aa, cr^repl); 
if {is.dirt-y{c,p)) { 

if {c-flusher .next) return A; 
fiush.caehe (c, p, false)-, 

} 

if {cr>victim) { 

q = choose-victim{s,cr‘aa,c-repl)-, (Swap cache blocks p and q 197); 
q-tag .h |= signj)it-, /* invalidate the tag */ 

return g; 

} 

p^tag.h 1= signj)it-, return p-, 

} 



aa\ int, §167. 

ARCS = macro, §6. 
hh: int, §167. 
bool = enum, §11. 
cache = struct, §167. 
cache.addr = macro ( ), §192. 
cache^search: static 
cacheblock *(), §193. 
cacheblock = struct, §167. 
cacheset = cacheblock *, 
§167. 

choose^victim: static 
cacheblock *(), §187. 



copy.out.time: int, §167. 

data: octa *, §167. 

dirty: char *, §167. 

false = 0, §11. 

filler: coroutine, §167. 

flusher: coroutine, §167. 

g: int, §167. 

h: tetra, §17. 

hit.set: cacheset, §194. 

inbuf: cacheblock, §167. 

is.dirty: static bool (), §170. 

1: tetra, §17. 

lock: lockvar, §167. 



next: coroutine *, §23. 
octa = struct, §17. 
outbuf: cacheblock, §167. 
repl: replace.policy , §167. 
sign.bit = macro, §80. 
startup: static void (), §31. 
tag: octa, §167. 
tagmask: int, §167. 
use.and.fix: static 
cacheblock *(), §196. 
victim: cacheset, §167. 
vrepl: replace.policy , §167. 
vv: int, §167. 
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206. Simulated memory. How should we deal with the potentially gigantic 
memory of MMIX? We can’t simply declare an array m that has 2^^ bytes. (Indeed, 
up to 2®^ bytes are needed, if we consider also the physical addresses > 2"^® that are 
reserved for memory-mapped input /output.) 

We could regard memory as a special kind of cache, in which every access is required 
to hit. For example, such an “M-cache” could be fully associative, with 2“ blocks 
each having a different tag; simulation could proceed until more than 2“ — 1 tags are 
required. But then the predefined value of a might well be so large that the sequential 
search of our cachesearch routine would be too slow. 

Instead, we will allocate memory in chunks of 2^® bytes at a time, as needed, and 
we will use hashing to search for the relevant chunk whenever a physical address is 
given. If the address is 2^® or greater, special routines called specjread and spec-write , 
supplied by the user, will be called upon to do the reading or writing. Otherwise the 
48-bit address consists of a 32-bit chunk address and a 16-bit chunk offset. 

Chunk addresses that are not used take no space in this simulator. But if, say, 1000 
such patterns occur, the simulator will dynamically allocate approximately 65MB for 
the portions of main memory that are used. Parameter mem_chunks-max specifies 
the largest number of different chunk addresses that are supported. This parameter 
does not constrain the range of simulated physical addresses, which cover the entire 
256 large-terabyte range permitted by MMIX. 

(Type definitions ll) -|-= 
typedef struct { 

tetra tag-, /* 32-bit chunk address */ 

octa * chunk-, /* either A or an array of 2^^ octabytes */ 

} chunknode; 

207. The parameter hash.prime should be a prime number larger than the pa- 
rameter mem^chunks-max , preferably more than twice as large but not much bigger 
than that. The default values mem.chunks-max = 1000 and hash.prime = 2003 are 
set by MMIX.config unless the user specifies otherwise. 

( External variables 4 } -|-= 

Extern int mem.chunks-, /* this many chunks are allocated so far */ 

Extern int mem.chunks.max; /* up to this many different chunks per run */ 
Extern int hash.prime-, /* larger than mem.chunks^max , but not enormous */ 
Extern chunknode *memjiash-, /* the simulated main memory */ 

208. The separately compiled procedures spec-read{) and spec-write{) have the 
same calling conventions as the general procedures mem.read ( ) and mem^write ( ) , 
but with an additional size parameter, which specifies that 1 <C size bytes should be 
read or written. 

( Subroutines 14 ) -|-= 

extern octa spec-read ARGS((octa addr, int size)); /* for memory mapped I/O */ 
extern void spec.write ARCS ((octa addr, octa val, int size)); /* likewise */ 
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209 . If the program tries to read from a chunk that hasn’t been allocated, the value 
zero is returned, optionally with a comment to the user. 

Chunk address 0 is always allocated first. Then we can assume that a matching 
chunk tag implies a nonnull chunk pointer. 

This routine sets lastJi to the chunk found, so that we can rapidly read other words 
that we know must belong to the same chunk. For this purpose it is convenient to let 
mem^hash[hash^prime] be a chunk full of zeros, representing uninitialized memory. 

( External prototypes 9 } += 

Extern octa memjread ARGS((octa addr)); 

210. ( External routines 10 ) += 
octa memjread [addr) 

octa addr\ 

{ 

register tetra ojf , key, 
register int h\ 

off = {addr . I & ^ 3; 

key = {addr . I & ^ffffOOOO) + addr .h\ 

for (/i = key % hash.prime; memjiash\h].tag 7 ^ key, h — ) { 
if {mem Jiash\h\. chunk = A) { 
if {verbose &i uniniUmemJ)it) 

errpnnt.8 ("uninitializedumemoryureaduatu°/o08x°/,08x" , addr.h, addr.l); 
h = hashjprimc, break; /* zero will be returned */ 

} 

if {h = 0) h = hash.prime; 

} 

lastji = h\ 

return memJiash[h].chunk[off\, 

} 

211 . (External variables 4) += 

Extern int last.h ; / * the hash index that was most recently correct * / 



ARCS = macro, § 6 . 
cache^search: static 
cacheblock *(), §193. 
errprintB = macro (), §13. 
Extern = macro, §4. 
h: tetra, §17. 



1: tetra, §17. 

mem.write: void (), §213. 
MMIX^config: void (), 
MMIX-CONFIG §38. 
octa = struct, §17. 
spec^read: octa (), 



MMIX-MEM §2. 

spec-write: (), mmix-mem §3. 
tetra = unsigned int, §17. 
uninit.memj)it = 1 <C 4, § 8 . 
verbose: int, §4. 



MMIX-PIPE: SIMULATED MEMORY 



242 



212. (External prototypes 9 ) += 

Extern void memjwrite ARGS((octa addr, octa val)); 

213. ( External routines 10 )+= 
void memjwrite ( addr , val ) 

octa addr, val-, 

{ 

register tetra off, key, 
register int /i; 

off = {addr . I & 3; 

key = {addr . I & ^ffffOOOO) + addr.h-, 

for {h — key % hashjprime-, memjiash[h].tag 7 ^ key, h — ) { 
if {memjiash[h\. chunk = K) { 

if (++ mem.chunks > mem^chunksjmax) 

panic {errprintl ("Moreuthanuy.dumemoryuchunksuareuneeded" , 
mem.chunks-max ) ) ; 

mem Jiash[h\. chunk = (octa *) calloc{l <C 13, sizeof (octa)); 
if {mem Jiash[h\. chunk = A) 

panic {errprintl ("lucan’tuallocateumemoryuchunkunumberu’/.d" , mem.chunks))-, 
memjiash[h].tag = key, 

break; 

} 

if (/i = 0) h — hashjprime-, 

} 

lastji = h-, 

mem.hash[h\. chunk [off] = val; 

} 

214. The memory is characterized by several parameters, depending on the char- 
acteristics of the memory bus being simulated. Let busjwords be the number of 
octabytes read or written simultaneously (usually bus-Words is 1 or 2; it must be 
a power of 2). The number of clock cycles needed to read or write c * bus-Words 
octabytes that all belong to the same cache block is assumed to be mem^addr^time -U 
c * mem_readJ,ime or mem_addr_time -U c* mem-write-time , respectively. 

( External variables 4 } -|-= 

Extern int mem.addrMme; /* cycles to transmit an address on memory bus */ 
Extern int bus-Words-, /* width of memory bus, in octabytes */ 

Extern int mem.read.time; /* cycles to read from main memory */ 

Extern int memjwrite-time-, /* cycles to write to main memory */ 

Extern lockvar mem Jock-, /* is nonnull when the bus is busy */ 

215. One of the principal ways to write memory is to invoke a flush^to-mem corou- 
tine, which is the Scache^flusher if there is an S-cache, or the D cache-* flusher if there 
is a D-cache but no S-cache. 

When such a coroutine is started, its data^ptr.a will be Scache or Dcache. The 
data to be written will just have been copied to the cache’s outbuf . 
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{ Cases for control of special coroutines 126 ) += 
case flush.to.mem-. 

{ register cache *c = (cache *) data^ptr.a; 
switch {data-^ state) { 
case 0: if {mem.lock) wait{l); 
data->state = 1; 

case 1: set.lock {self , mem.lock)-, 
data-'state = 2; 

(Write the dirty data of c-‘outbuf and wait for the bus 216); 
case 2: goto terminate-, /* this frees mem.lock and c-outbuf */ 
} 

} 

216. (Write the dirty data of c-outbuf and wait for the bus 216} = 

{ 

register int off, last.off , count, first, ii-, 

register int del = c~>gg 3> 3; /* octabytes per granule */ 

octa addr-, 

addr = c-outbuf .tag-, off = {addr . I & 3> 3; 

for {i = j = 0, first — 1, count =0; j < c-bb ^ c-g; j++) { 
u = i + del ; 

if {-'cr’outbuf .dirty [j]) i = ii, off += del, addr . I += del <C 3; 
else while {i < ii) { 
if {first) { 

count ++; last.off = off; first = 0; 
mem.write {addr , cr-outbuf .data[i]); 

} else { 

if {{off ® last.off) {—bus.words)) count++; 
last.off = off; 

mem.hash[last.h].chunk[off] = c-'outbuf .data[i]; 

} 

*++! aff++; addr. I += 8; 

} 

} 

wait {mem.addr. time + count * mem.write.time); 

} 

This code is used in section 215. 



ARCS = macro, §6. 
bb: int, §167. 
cache = struct, §167. 
calloc: void *(), <stdlib.h>. 
chunk: octa *, §206. 
data: register control *, 
§124. 

data: octa *, §167. 

Dcache: cache *, §168. 
dirty: char *, §167. 
errprintl = macro (), §13. 
Extern = macro, §4. 
flush^to^mem = 97 , §129. 
flusher: coroutine, §167. 



g: int, §167. 
gg: int, §167. 
h: tetra, §17. 
hash.prime: int, §207. 
i: register int, §12. 
j: register int, §12. 

1: tetra, §17. 
last^h: int, §211. 
lockvar = coroutine *, §37. 
mem.chunks: int, §207. 
mem.chunks.max : int, §207. 
memJiash: chunknode *, 
§207. 

octa = struct, §17. 



outbuf: cacheblock, §167. 
panic = radicro (), §13. 
ptr.a: void *, §44. 

Scache: cache *, §168. 
self: register coroutine 
§124. 

set Jock = macro (), §37. 
state: int, §44. 
tag: tetra, §206. 
tag: octa, §167. 
terminate: label, §125. 
tetra = unsigned int, §17. 
wait = macro (), §125. 
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217. Cache transfers. We have seen that the D cache- flusher sends data directly 
to the main memory if there is no S-cache. But if both D-cache and S-cache exist, the 
D cache- flusher is a more complicated coroutine of type flush.to.S . In this case we 
need to deal with the fact that the S-cache blocks might be larger than the D-cache 
blocks; furthermore, the S-cache might have a write-around and/or write-through 
policy, etc. But one simplifying fact does help us: We know that the flusher coroutine 
will not be aborted until it has run to completion. 

Some machines, such as the Alpha 21164, have an additional cache between the 
S-cache and memory, called the B-cache (the “backup cache”). A B-cache could be 
simulated by extending the logic used here; but such extensions of the present program 
are left to the interested reader. 

(Cases for control of special coroutines 126) -|-= 
case flushHo.S : 

{ register cache *c = (cache *) data-ptr.a-, 
register int block.diff = Scache-bb — c-bb; 
p = (cacheblock *) data-ptrj ) ; 
switch (data-state) { 
case 0: if (Scache-lock) wait{l); 
data-state = 1 ; 

case 1 : setJock {self , Scache-lock); 

data-ptrJj = (void *) cache-search{Scache,c-outbuf .tag); 
if {data-ptr.b) data-state = 4; 

else if {Scache-mode & WRITE_ALLQC) data-state = {block.diff ? 2 : 3); 
else data-state = 6; 
wait{Scache-access.time ) ; 

case 2: (Fill Scache-inbuf with clean memory data 219); 
case 3: (Allocate a slot p in the S-cache 2is); 

if (block.diff) (Copy Scache-inbuf to slot p 220); 
case 4: copy_block{c,&i{c-outbuf),Scache,p); 

hit.set = cache.addr {S cache, cToutbuf .tag); use.and.fix{Scache,p); 

/* p not moved */ 

data-state = 5 ; wait{Scache-copy-inMme); 
case 5 : if {{Scache-mode SzWRnE_BACK) = 0 ) { /* write-through */ 

if {S cache- flusher .next) wait{l); 
flush.cache {Scache , p, true); 

} 

goto terminate; 

case 6 : (Handle write-around when flushing to the S-cache 221); 

} 

} 

218. (Allocate a slot p in the S-cache 21s) = 

if {S cache- filler .next) wait{l); /* perhaps an unnecessary precaution? */ 
p = alloc-slot {S cache ,c-‘outbuf .tag); 
if (^p) wait{l); 
data-ptrJ) = (void *) p; 

p^tag = c-outbuf .tag ; p-tag.l = cr'outbuf .tag .1 & {—Scache-bb); 

This code is used in section 217 . 
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219. We only need to read block.dijf bytes, but it’s easier to read them all and to 
charge only for reading the ones we needed. 

( Fill Scache-'inbuf with clean memory data 219 ) = 

{ register int count = block.dijf 3; 
register int off, delay, 
octa addr\ 

if {memJock) wait{l)-, 

addr.h = cr>outbuf .tag .h\ addr.l = cr^outbuf .tag .1 & —Scache->bb\ 

off = [addr.l & 3> 3; 

for [j = 0; j < Scache^bb ^ 3; j++) 

if [j = 0) Scache-'inbuf .data[j] = mem.read[addr); 
else Scache-’inbuf .data [j] = memjiash\lastji\.chunk[j + off\, 
set-lock [Ezmemdocker , mem-lock)-, 

delay = mem-addr-time + (int ) ( ( count + bus-Words — 1 ) / ( bus-Words ) ) * mem-read-time 
startup [&cmemJocker , delay)-, 
data-’ state = 3; wait [delay)-, 

} 

This code is used in section 217. 

220. (Copy Scache~*inbuf to slot p 220) = 

{ 

register octa *d = p-’data-, 

p-’data = Scache-’inbuf .data-, Scache-’inbuf .data = d; 

} 

This code is used in section 217. 



access.time: int, §167. 
alloc.slot: static cacheblock 

* 0 , §205. 
bb: int, §167. 
bus.words: int, §214. 
cache = struct, §167. 
cache^addr = macro ( ), §192. 
cache^search: static 
cacheblock *(), §193. 
cacheblock = struct, §167. 
chunk: octa *, §206. 
copy.block: static void (), 
§185. 

copy.in.time: int, §167. 
data: register control *, 
§124. 

data: octa *, §167. 

Dcache: cache *, §168. 
filler: coroutine, §167. 
flush.cache: static void (), 



§203. 

flushAo.S = 96, §129. 
flusher: coroutine, §167. 
h: tetra, §17. 
hit.set: cacheset, §194. 
inbuf: cacheblock, §167. 
j: register int, §12. 

1 : tetra, §17. 
last.h: int, §211. 
lock: lockvar, §167. 
mem.addr.time: int, §214. 
memJiash: chunknode *, 
§207. 

memdock: lockvar, §214. 
memdocker: coroutine, §127. 
mem.read: octa (), §210. 
mem.readdime: int, §214. 
mode: int, §167. 
next: coroutine *, §23. 
octa = struct, §17. 



outbuf: cacheblock, §167. 
p: register cacheblock *, 

§258. 

ptr.a: void *, §44. 
ptrdr. void *, §44. 

Scache: cache *, §168. 
self: register coroutine *, 
§124. 

setdock = macro (), §37. 
startup: static void (), §31. 
state: int, §44. 
tag: octa, §167. 
terminate: label, §125. 
true = 1, §11. 
use.and.fix: static 
cacheblock *(), §196. 
wait = macro (), §125. 

WRITE. ALLOC =2, §166. 
WRITE.BACK = 1, §166. 
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221. Here we assume that the granularity is 8. 

( Handle write-around when flushing to the S-cache 221 ) = 
if {S cache-’ flusher .next) 

S cache-’ outbuf .tag .h = c-’outbuf .tag .h\ 

Scache-’outbuf .tag .1 = c-’outbuf .tag . I & {—Scache-’bb); 

for (i = 0; j < Scaehe-’bb Scache-’g; j++) Scache-’outbuf .dirty [j] —false; 
copyMock{c, &i{c-’outbuf), S cache, &i{S cache-’ outbuf)); 
startup {SiScache-’flusher , S cache-’ copy -OuEtime); 
goto terminate; 

This code is used in section 217. 

222. The S-cache gets new data from memory by invoking a filLfrom^mem corou- 
tine; the Tcache or D-cache may also invoke a filLfrom_mem coroutine, if there is no 
S-cache. When such a coroutine is invoked, it holds mem Jock , and its caller has gone 
to sleep. A physical memory address is given in data-’z.o, and data-’ptr_a specifies 
either Icache, Dcache, or Scache. Furthermore, data^ptrJ) specifies a block within 
that cache, determined by the allocslot routine. The coroutine simulates reading 
the contents of the specified memory location, places the result in the x.o field of its 
caller’s control block, and wakes up the caller. It proceeds to fill the cache’s inbuf 
and, ultimately, the specified cache block, before waking the caller again. 

Let c = data^ptr_a. The caller is then c-’filLlock, if this variable is nonnull. 
However, the caller might not wish to be awoken or to receive the data (for example, 
if it has been aborted). In such cases c-’filLlock will be A; the filling action continues 
without the wakeup calls. If c = Scache, the S-cache will be locked and the caller will 
not have been aborted. 

( Cases for control of special coroutines 126 ) -|-= 
case filLfrom.mem: 

{ register cache *c = (cache *) data^ptr.a; 
register coroutine *cc = c-’filLlock; 

switch (data-’state) { 
case 0: data-’x.o — memjread[data^z.o); 
if (cc) { 

cc-’ctl-’x.o — data-’x.o; awaken [cc, mem.readHime); 

} 

data-’state = 1; 

( Read data into cr’inbuf and wait for the bus 223 ) ; 
easel: release Jock[self , memjock); data-’state = 2; 
case 2: if [cf^ Scache) { 
if [c-’lock) wait[l); 
setJock[self , c-’lock); 

} 

if (cc) awaken (cc,c:^copyJn-time); /* the second wakeup call */ 
load_cac/ie (c, (cacheblock *) data-’ptrj); 
data^state = 3; wait [cr’ copy JnJ.ime); 
case 3: goto terminate; 

} 



} 
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223. If c’s cache size is no larger than the memory bus, we wait an extra cycle, so 
that there will be two wakeup calls. 

( Read data into c-’inbuf and wait for the bus 223 ) = 

{ 

register int count, ojf; 

c-’inbuf .tag = data-’z.o-, c-’inbuf .tag . I &= —c-’bb\ 
count = c-’bb 3, ojf = {c-’inbuf .tag . I & 3; 

for (i = 0; i < count-, i++, Ojff ++) cr’inbuf .data\i\ = memJiash[lastJi\.chunk[off\-, 
if {count < bus-Words) wait{l + memjreadJ,ime) 
else wait {{int) {count /bus. words ) * mem.read.time)-, 

} 

This code is used in section 222. 



alloc.slot: static cacheblock 

* 0 , §205. 

awaken = mdiCTO (), §125. 
bb: int, §167. 
bus.words: int, §214. 
c: register cache *, §217. 
cache = struct, §167. 
cacheblock = struct, §167. 
chunk: octa *, §206. 
copy.block: static void (), 
§185. 

copy.in^time: int, §167. 
copy. out J-ime: int, §167. 
coroutine = struct, §23. 
ctl: control *, §23. 
data: register control *, 
§124. 

data: octa *, §167. 

Dcache: cache *, §168. 



dirty: char §167. 
false = 0, §11. 
filLfrom.mem =9^, §129. 
filLlock: lockvar, §167. 
flusher: coroutine, §167. 
g: int, §167. 
h: tetra, §17. 
i: register int, §12. 

Icache: cache *, §168. 
inbuf: cacheblock, §167. 
j: register int, §12. 

1: tetra, §17. 
last.h: int, §211. 
load.cache: static void (), 
§201. 

lock: lockvar, §167. 
memJiash: chunknode *, 
§207. 

mem.lock: lockvar, §214. 



memjread: octa (), §210. 
mem.read.time: int, §214. 
next: coroutine *, §23. 
o: octa, §40. 

outbuf: cacheblock, §167. 
ptr.a: void *, §44. 
ptr.b: void *, §44. 
release.lock = macro ( ), §37. 
Scache: cache *, §168. 
self: register coroutine *, 
§124. 

set.lock = macro (), §37. 
startup: static void (), §31. 
state: int, §44. 
tag: octa, §167. 
terminate: label, §125. 
wait = macro (), §125. 
x: specnode, §44. 
z: spec, §44. 
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224. The filLfrom_S coroutine has the same conventions as filLfrom_mem , except 
that the data comes directly from the S-cache if it is present there. This is the filler 
coroutine for the Tcache and D-cache if an S-cache is present. 

(Cases for control of special coroutines 126} -|-= 
case filLfrom.S : 

{ register cache *c = (cache *) data^ptr.a-, 
register coroutine *cc = c-fillJock-, 

p = (cacheblock *) data-’ptr.c, 
switch (data-'state) { 

case 0: p = cache_search{Scache, data-’z.o)-, 
if (p) goto S-non.miss; 
data-'state = 1; 

case 1: (Start the S-cache hller 225 ); 

data^ state = 2; sleep', 
case 2: if (cc) { 

cc-'ctl-'x.o = data-x.o', /* this data has been supplied by S cache-' filler */ 
awaken{cc, Scache-'accessHime)', /* we propagate it back */ 

} 

data-^state = 3; sleep', /* when we awake, the S-cache will have our data */ 
Sjnonjmiss'. if (cc) { 

cc-'ctl-'x.o = p-'data[{data-'z.o.l & {Scache^bb — 1)) 3]; 

awaken ( cc , S cache-' access J,ime ) ; 

} 

case 3: ( Copy data from p into cr'inbuf 226 ) ; 

data-'state = 4; wait{Scache-*accessHime)', 
case 4: Scache-'lock = A; /* we had been holding that lock */ 
data^state = 5; 
case 5: if (c-'lock) wait{l)', 
setJock{self ,c-'lock)', 

/oad_cac/ie(c, (cacheblock *) data-ptrj))', 
data-'state = 6; wait{c'copyjinJ.ime)', 
case 6: if (cc) awaken (cc, 1)', /* second wakeup call */ 

goto terminate; 

} 

} 
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225. We are already holding the Scache-^lock , but we’re about to take on the 
Scache^fillJock too (with the understanding that one is “stronger” than the other). 
For a short time the Scache-lock will point to us but we will point to S cache- filLlock; 
this will not cause difficulty, because the present coroutine is not abortable. 

( Start the S-cache filler 225 } = 

if [S cache- filler .next V memJock) wait{l)-, 
p = alloc.slot {Scache , data-z.o)\ 
if ij-'p) wait{l); 

setjock {&iScache-filler , memJock); 

s et Jock [s elf , Scache-filLlock); 

data-ptr.c = Scache-filler^ctl .ptrj) = (void *) p\ 

Scache-filler.ctl .z.o = data-z.o- 
startup S cache-filler , mem.addrjime); 

This code is used in section 224. 

226. The S-cache blocks might be wider than the blocks of the Tcache or D-cache, 
so the copying in this step isn’t quite trivial. 

( Copy data from p into c-inbuf 226 } = 

{ register int off\ 

c-inbuf .tag = data-z.o; c^inbuf .tag .1 &i= —c-bb; 

for {j = 0, Ojff = {c-inbuf .tag . I & {Scache-bb — 1)) S> 3; j < cr*bb ^ 3; j++, off ++) 
c-inbuf .data[j] = p-data[off]; 
releasejock ( self , Scache- fill Jock ) ; 
setJock{self , Scache-lock); 

} 

This code is used in section 224. 



access^time: int, §167. 
alloc^slot: static cacheblock 

*0, §205. 

awaken = mdiCTO (), §125. 
bb: int, §167. 
cache = struct, §167. 
cache^search: static 
cacheblock *(), §193. 
cacheblock = struct, §167. 
copy.in.time-. int, §167. 
coroutine = struct, §23. 
ctl: control *, §23. 
data: octa *, §167. 
data: register control *, 
§124. 

filLfrom^mem = 95, §129. 



filLfrom^S =94, §129. 
filLlock: lockvar, §167. 
filler: coroutine, §167. 
filler.ctl: control, §167. 
inbuf: cacheblock, §167. 
j: register int, §12. 

1: tetra, §17. 

load^cache: static void (), 

§ 201 . 

lock: lockvar, §167. 
mem.addr.time: int, §214. 
memJock: lockvar, §214. 
next: coroutine *, §23. 
o: octa, §40. 

p: register cacheblock *, 
§258. 



ptr.a: void *, §44. 
ptrjy. void *, §44. 
ptr.c: void *, §44. 
releasejock = macro ( ), §37. 
Scache: cache *, §168. 
self: register coroutine *, 
§124. 

set Jock = macro (), §37. 
sleep = macro, §125. 
startup: static void (), §31. 
state: int, §44. 
tag: octa, §167. 
terminate: label, §125. 
wait = macro (), §125. 
x: specnode, §44. 
z: spec, §44. 
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227 . The instruction PRELD X,$Y,$Z generates [X/2^J commands if there are 2^ 
bytes per block in the D-cache. These commands will try to preload blocks $Y + $Z, 
$Y + $Z + 2^, . . . , into the cache if it is not too busy. 

Similar considerations apply to the instructions PREGO X, $Y, $Z and PREST X, $Y, $Z. 

(Special cases of instruction dispatch 117 ) += 
case preld: case prest: if {-iDcache) goto noop.inst; 
if {cool-xx > Dcache-'bb) cool-interim = true; 
cool-ptr.a = (void *) mem. up; break; 
case prego: if {-ilcache) goto noopjinst; 

if {cool-xx > Icache-bb) cool-interim = true; 
cool-ptr.a — (void *) mem. up; break; 

228 . If the block size is 64, a command like PREST 200,$Y,$Z is actually is- 
sued as four commands PREST 200,$Y,$Z; PREST 191,$Y,$Z; PREST 127,$Y,$Z; 
PREST 63,$Y,$Z. An interruption will then be able to resume properly. In the 
pipeline, the instruction PREST 200 , $Y , $Z is considered to affect bytes $Y -P $Z -P 192 
through $Y -P $Z -P 200, or fewer bytes if $Y -P $Z is not a multiple of 64. (Remem- 
ber that these instructions are only hints; we act on them only if it is reasonably 
convenient to do so.) 

( Get ready for the next step of PRELD or PREST 228 ) = 

head-inst = {head-inst & '^{{Dcache-bb — 1) <C 16)) — *10000; 

This code is used in section 81. 

229 . ( Get ready for the next step of PREGO 229 ) = 
head-inst = {head-inst & '^{{Icache-bb — 1) <C 16)) — *10000; 

This code is used in section 81. 

230 . Another coroutine, called cleanup, is occasionally called into action to remove 
dirty data from the D-cache and S-cache. If it is invoked by starting in state 0, with 
its i field set to sync, it will clean everything. It can also be invoked in state 4, with 
its i field set to syncd and with a physical address in its z.o field; then it simply makes 
sure that no D-cache or S-cache blocks associated with that address are dirty. 

Field x.o.h should be set to zero if items are expected to remain in the cache after 
being cleaned; otherwise field x.o.h should be set to sign_bit. 

The coroutine that invokes cleanup should hold clean Jock. If that coroutine dies, 
because of an interruption, the cleanup coroutine will terminate prematurely. 

We assume that the D-cache and S-cache have some sort of way to identify their 
first dirty block, if any, in accessJime cycles. 

( Global variables 20 ) -P= 
coroutine clean.co; 
control clean-ctl; 
lockvar clean Jock; 

231 . (Initialize everything 22 ) -P= 
cleau-co .ctl = &cclean_ctl; 
cleau-co .name = "Clean"; 
clean-co .stage = cleanup; 
clean.ctl .go . 0.1 = 4; 
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232 . (Cases for control of special coroutines 126) += 
case cleanup: p— (cacheblock *) data-‘ptrj>-, 
switch {data-state) { 

{ Cases 0 through 4, for the D-cache 233 ); 

( Cases 5 through 9, for the S-cache 234); 
case 10: goto terminate; 

} 



access.time: int, §167. 
bb: int, §167. 

cacheblock = struct, §167. 
cleanup =91, §129. 
control = struct, §44. 
cool: control *, §60. 
coroutine = struct, §23. 
ctl: control *, §23. 
data: register control *, 
§124. 

Dcache: cache *, §168. 
go: specnode, §44. 
h: tetra, §17. 
head: fetch §69. 



i: register int, §12. 

Icache: cache *, §168. 
inst: tetra, §68. 
interim: bool, §44. 

1: tetra, §17. 

lockvar = coroutine *, §37. 
mem: specnode, §115. 
name: char *, §23. 
noop^inst: label, §118. 
o: octa, §40. 

p: register cacheblock *, 

§258. 

prego = 73, §49. 
preld = 61, §49. 



prest = 62, §49. 
ptr.a: void *, §44. 
ptrjy. void *, §44. 
sign.bit = macro, §80. 
stage: int, §23. 
state: int, §44. 
sync = 79, §49. 
syncd = 64, §49. 
terminate: label, §125. 
true = 1, §11. 
up: specnode *, §40. 
x: specnode, §44. 
xx: unsigned char, §44. 
z: spec, §44. 
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233. ( Cases 0 through 4, for the D-cache 233 ) = 

case 0: if {Dcache~>lock V {j = geEreader (Dcache) < 0)) wait{l)-, 
startup {KiDcache-^reader [j], Dcache->access-time ) ; 
setJock(self , Dcache-lock)-, 
i=j = 0- 

DcleanJoop: p — (i < Dcache-cc ? &i{Dcache-set[i][j]) : &i{Dcache-’victim[j]))-, 
if (jr>tag.hk,signJ)it) goto Dcleanjinc, 
if (-^is.dirty {Dcache , p)) { 

p-'tag.h \= data^x.o.h', goto Dclean.inc; 

} 

data-*y.o.h = i, data^y.o.l — j\ 

Dclean: data^state — 1; data^ptr.b = (void *) p; wait{Dcache-‘accessHime)\ 
case 1: if {Dcache-flusher .next) wait{l)\ 
flush.cache[D cache, p, data-x.o.h = 0); 
p^tag.h 1= data-x.o.h', 
release Jock ( self , DcacheJock ) ; 
data^state = 2; w ait {D caches copy -outjime)-, 
case 2: if {-icleanJock) goto done-, /* premature termination */ 
if {D cache-- fiusher .next) waitil)-, 
if {data^ 7 ^ sync) goto Sprep; 
data-state = 3; 

case 3: if {Dcache-lock V {j = geEreader (Dcache) < 0)) wait{l)-, 
startup {&iD cache-reader [j], D cache-access Jime ) ; 
setJock{self , Dcache-lock)-, 
i = data-y.o.h,j = data-y.o.l-, 

Dclean Jnc: j++-, 

if {i < Dcache-cc Aj = Dcache-aa) j = 0,i++; 
if {i = Dcache-cc A j = Dcache-vv) { 

data-state = 5; wait {Dcache-access Jime)-, 

} 

goto DcleanJoop-, 

case 4: if {Dcache-lock V {j = geEreader {Dcache) < 0)) wait{l)-, 
startup {&C.D cache-reader [j] , Dcache-access Jime ) ; 
setJock{self , Dcache-lock)-, 
p — cache.search {Dcache, data-z.o); 
if (P) { 

demote-and.fix {Dcache , p ) ; 

if {is -dirty {D cache ,p)) goto Dclean-, 

} 

data-state = 9; wait{Dcache-access-time)-, 

This code is used in section 232. 

234. ( Cases 5 through 9, for the S-cache 234 ) = 

case 5: if {self-lockloc) *{self-lockloc) = A, self-lockloc = A; 
if {-iScache) goto done-, 
if {Scache-lock) wait{l)-, 
set-lock {self , Scache-lock); 
i = j = 0; 

ScleanJoop: p = {i < Scache-cc ? &L{Scache-set[i][j]) : &i{Scache-victim[j])); 
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if {pr>tag .h sign.bit) goto ScleanAnc, 
if {-'is.dirty{Scache,p)) { 

p^tag.h 1= data->x.o.h\ goto Scleanjinc, 

} 

data-nj.o.h = i, data^.o.l = j; 

Sclean-. data-state = 6; data-'ptrj) = (void *) p\ wait{Scache-‘access-time); 
case 6: if {S cache-’ flusher .next) uiaii(l); 
flush.cache{Scache,p, data-’x.o.h = 0); 
pr^tag.h |= data-’x.o.h', 
releaseJock{self , Scaehe-’lock)-, 
data-’state = 7; wait{Scache-’copy-OutJ,ime)', 
case 7: if clean Jock) goto done, /* premature termination */ 

if {S cache-’ flusher .next) wait{l)-, 
if [dataJ ^ sync) goto done, 
data-’state = 8; 

case 8: if (Scache-’lock) wait{l); 
set Jock {self , Scache-’lock)', 
i = data-’y.o.h, j = data-’y.o.t, 

ScleanJnc. j++', 

if {i < Scache-’cc Aj = Scache-’aa) j = 0, i++; 
if {i = Scache-’cc A j = Scache-’vv ) { 

data^state = 10; w ait {S cache-’ accessJime)', 

} 

goto Sclean Joop', 

Sprep'. data-’state = 9; 

case 9: if {self-’lockloc) releaseJock{self , Dcache-’lock)', 
if {-iScache) goto done-, 
if {Scaehe-’loek) wait{l)-, 
s et Jock {s elf , Scache-’lock)', 
p = cache-search {Scache, data-’z.o)', 

if (P) { 

demote-and-fix {Scache , p ) ; 

if {is -dirty {S cache, p)) goto Sclean-, 

} 

data~*state = 10; wait{Scache-*access.time)\ 

This code is used in section 232. 



aa: int, §167. 
access^time: int, §167. 
cache^search: static 
cacheblock *(), §193. 
cc: int, §167. 
cleanJock: lockvar, §230. 
copy. out int, §167. 
data: register control *, 
§124. 

Dcache: cache *, §168. 
demote.and.fix: static 
cacheblock *(), §199. 
done: label, §125. 
flush.cache: static void (), 
§203. 

flusher: coroutine, §167. 



get.reader: static int (), §183. 
h: tetra, §17. 
i: internal.opcode, §44. 
i: register int, §12. 
is.dirty: static bool (), §170. 
j: register int, §12. 

1: tetra, §17. 
lock: lockvar, §167. 
lockloc: coroutine **, §23. 
next: coroutine *, §23. 
o: octa, §40. 

p: register cacheblock *, 

§258. 

ptr.b: void *, §44. 
reader: coroutine *, §167. 
releas e.lock =ma.cTO (), §37. 



Scache: cache *, §168. 
self: register coroutine *, 
§124. 

set: cacheset *, §167. 

set Jock = macro (), §37. 

sign.bit = macro, §80. 

startup: static void (), §31. 

state: int, §44. 

sync = 79, §49. 

tag: octa, §167. 

victim: cacheset, §167. 

vv: int, §167. 

wait = macro (), §125. 

x: specnode, §44. 

y: spec, §44. 

z: spec, §44. 
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235. Virtual address translation. Special arrays of coroutines and control 
blocks come into play when we need to implement MMIX’s rather complicated page 
table mechanism for virtual address translation. In effect, we have up to ten control 
blocks outside of the reorder buffer that are capable of executing instructions just as 
if they were part of that buffer. The “opcodes” of these non-abortable instructions 
are special internal operations called Idptp and Idpte , for loading page table pointers 
and page table entries. 

Suppose, for example, that we need to translate a virtual address for the DT-cache 
in which the virtual page address (0403020100)1024 of segment i has 04 = 03=0 
and 02 7^ 0 . Then the rules say that we should first find a page table pointer p2 in 
physical location 2^^(r + bi + 2 ) + 802, then another page table pointer pi in location 
P2 + 801, and finally the page table entry po in location pi + 8oq. The simulator 
achieves this by setting up three coroutines co, ci, C2 whose control blocks correspond 
to the pseudo-instructions 

LDPTP X, [2®3 + 2i3(r + bi + 2)] ,802 
LDPTP x,x,8ai 
LDPTE x,x,8ao 

where a; is a hidden internal register and the other quantities are immediate values. 
Slight changes to the normal functionality of LDO give us the actions needed to im- 
plement LDPTP and LDPTE. Coroutine Cj corresponds to the instruction that involves 
Gj and computes pj; when cq has computed its value po, we know how to translate 
the original virtual address. 

The LDPTP and LDPTE commands return zero if their y operand is zero or if the 
page table does not properly match rV. 

T^define LDPTP PREGO /* internally this won’t cause confusion */ 

#deflne LDPTE GO 
( Global variables 20 } -|-= 

control IPTctl[ 5 ], DPTctl[ 5 ]\ /* control blocks for I and D page translation */ 
coroutine IPTco[ 10 ], DPTco[ 10 ]', /* each coroutine is a two-stage pipeline */ 

char *IPTname[ 5 ] = {"IPTO", "IPTl", "LPT2" , "IPT3" , "IPT4"}; 
char *DPTname[S\ = {"DPTO", "DPTl", "DPT2", "DPT3", "DPT4"}; 
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236. (Initialize everything 22 ) += 
for (j = 0; j < 5; j++) { 

DPTco[2 * j].ctl = &6DPTctl[j]-, IPTco[2 * j].ctl = &cIPTctl[j]; 

if {j > 0) DPTctl\j].op = IPTctl[j].op — LDPTP, DPTctl[j].i = IPTctl\j].i = Idptp-, 

else DPTctllQ].op = IPTctl[0].op — LDPTE, DPTctl[0].i = IPTctl[0].i = ldpte\ 

IPTctl[j].loc — DPTctl[j].loc = neg-one; 

lPTctl[j].go .0 = DPTctl[j].go .0 = incr[neg^one,A)\ 

IPTctl\j].ptr.a = DPTctl[j].ptr.a = (void *) &mem; 

IPTctl[j].reri-X = DPTctl\j].ren.x = true, 

IPTctl[j].x.addr .h — DPTctl[j].x.addr .h = —1; 

IPTco[2 * j].stage = DPTco[2 * j].stage = 1; 

IPTco[2 * j + 1], stage = DPT<xi\2 * j + 1], stage = 2; 

IPTco[2 * j].name = lPTco[2 * j + l].name = lPTname[j]- 
DPTco [2 * j].name = DPTco[2 * j + l].name = DPTname\j]-, 

} 

ITcache^ filler _ctl .ptr_c = (void *) &/PTco[0]; 

DTcache^filler_ctl.ptr_c = (void *) hDPTcof}]\ 



addr: octa, §40. 
control = struct, §44. 
coroutine = struct, §23. 
ctl: control *, §23. 
DTcache: cache *, §168. 
filler^ctl: control, §167. 
GO = ^9e, §47. 
go: specnode, §44. 
h: tetra, §17. 
i: inter nal.opcode, §44. 



incr: octa (), mmix-arith §6. 
ITcache: cache *, §168. 
j: register int, §10. 

Idpte = 58, §49. 

Idptp = 57, §49. 

loc: octa, §44. 

mem: specnode, §115. 

name: char *, §23. 

neg^one: octa, mmix-ARITH §4. 



o: octa, §40. 
op: mmix_opcode, §44. 
PREGQ = =^9c, §47. 
ptr.a: void *, §44. 
ptr.c: void *, §44. 
renjx: bool, §44. 
stage: int, §23. 
true = 1, §11. 
x: specnode, §44. 
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237. Page table calculations are invoked by a coroutine of type filLfrom_virt , which 
is used to fill the IT-cache or DT-cache. The calling conventions of filLfrom.virt are 
analogous to those of filLfrom^mem or filLfrom.S : A virtual address is supplied in 
data^.o, and data-ptr^a points to a cache {ITcache or DTcache), while data^ptr^b 
is a block in that cache. We wake up the caller, who holds the cache’s fillJock, as 
soon as the translation of the given address has been calculated, unless the caller has 
been aborted. (No second wakeup call is necessary.) 

(Cases for control of special coroutines 126 } += 
case filLfrom.virt: 

{ register cache *c = (cache *) data^ptr.a-, 
register coroutine *cc = c->fillJ,ock\ 
register coroutine *co = (coroutine *) data->ptr.c\ 

/*■ &/PTco[0] or &Z)Prco[0] */ 
octa aaaaa\ 
switch (data-'state) { 

case 0: (Start up auxiliary coroutines to compute the page table entry 243 ); 

data- state = 1 ; 
case 1 : if (data-b.p) { 

if (data-b.p-known) data-b.o = data-b.p-o, data-b.p = A; 
else wait{l); 

} 

{ Compute the new entry for cr>inbuf and give the caller a sneak preview 245 ) ; 
data- state = 2 ; 
case 2 : if (c-lock) wait{l)-, 
setJock{self ,cr>lock); 

load.cache {c, {cachehloch *) data-ptrji)\ 
data- state = 3; wait {c-*copy jin J.ime)\ 
case 3: data-b.o = zero.octa; goto terminate; 

} 

} 



238. The current contents of rV, the special virtual translation register, are kept 
unpacked in several global variables page^r, pages, etc., for convenience. Whenever 
rV changes, we recompute all these variables. 



( Global variables 20 ) += 

int pages; /* the 10-bit n field of rV, times 8 */ 

int pages; /* the 27-bit r field of rV */ 

int pages; /* the 8-bit s field of rV */ 

int page.f; /* the 3-bit / field of rV */ 

int page.b[5]; /* the d-bit b fields of rV; page-b[0] = 

octa page^mask; /* the least significant s bits */ 

bool pagej)ad = true ; / * does rV violate the rules? 



0 */ 
*/ 



239. (Update the page variables 239 ) = 

{ octa rv; 

rv = data-z.o; 

page-f = rv.l 7 , pagejrad — {pagej > 1); 

pages = rv.l & *lff8; 
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rv = shifLright {rv , 13, 1); 
pagejr = tv. I & 
rv = shift jright {rv , 27, 1); 
pages = rv .1 *ff- 

if {pages < 13 V pages > 48) page-bad — true ; 

else if {pages < 32) page-mask .h = 0, page-mask . I = (1 <C pages) — 1; 

else page-mask .h = (1 <C {pages — 32)) — 1, page-mask . I — 

page-b[4] = {rv.l S> 8) & *f; 

page-b[3] = {rv.l S> 12) & *f ; 

page-b[2] = {rv.l S> 16) & 

page-b[i] = {rv.l S> 20) & ; 

} 

This code is used in section 329. 

240 . Here’s how we compute a tag of the IT-cache or DT-cache from a virtual 
address, and how we compute a physical address from a translation found in the 
cache. 

^define trans-key {addr) incr{oandn{addr , page-mask), page-u) 

{ Internal prototypes 13 ) += 

static octa phys-addr ARGS((octa, octa)); 

241 . (Subroutines 14} += 
static octa phys-addr{virt , trans) 

octa virt, trans \ 

{ octa t\ 

t = oandn{trans , page-mask)-, /* zero out the ynp fields of a PTE */ 
return oplus{t, oand {virt, page-mask))-, 

} 

242 . Cheap (and slow) versions of MMIX leave the page table calculations to software. 
If the global variable no-hardware-PT is set true, fULfrom-virt begins its actions in 
state 1, not state 0. (See the RESUME_TRANS operation.) 

( External variables 4 ) += 

Extern bool no-hardware-PT ; 



ARCS = macro, §6. 
b: spec, §44. 

6, mmix-doc§45. 
bool = enum, §11. 
cache = struct, §167. 
cacheblock = struct, §167. 
copy.in.time-. int, §167. 
coroutine = struct, §23. 
data: register control *, 
§124. 

DPTco: coroutine [], §235. 
DTcache: cache *, §168. 
Extern = macro, §4. 

/: register int, §75. 
filLfrom^mem = 95, §129. 
filLfrom^S = 94, §129. 
filLfrom^virt =93, §129. 
filLlock: lockvar, §167. 
h: tetra, §17. 



inbuf: cacheblock, §167. 
incr: octa (), mmix-ARITH §6. 
IPTco: coroutine [], §235. 
ITcache: cache =t=, §168. 
known: bool, §40. 

1 : tetra, §17. 

load^cache: static void (), 

§ 201 . 

lock: lockvar, §167. 
n, mmix-doc§45. 
o: octa, §40. 
oand: octa (), 

MMIX-ARITH §25. 
oandn: octa (), 

MMIX-ARITH §25. 
octa = struct, §17. 
oplus: octa (), MMIX-ARITH §5. 
p: specnode *, §40. 
ptr^a: void *, §44. 



ptr.b: void *, §44. 
ptr.c: void *, §44. 

r, mmix-doc§45. 
RESUME_TRANS =3, §320. 

s, mmix-doc§45. 

self: register coroutine *, 
§124. 

set Jock = macro (), §37. 
shift.right: octa {), 
MMIX-ARITH §7. 
state: int, §44. 
terminate: label, §125. 
true = 1, §11. 
wait = macro (), §125. 
y: spec, §44. 

2 :: spec, §44. 
zero.octa: octa, 
MMIX-ARITH §4. 
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243 . Note: The operating system is supposed to ensure that changes to the page 
table entries do not appear in the pipeline when a translation cache is being updated. 
The internal LDPTP and LDPTE instructions use only the “hot state” of the memory 
system. 

( Start up auxiliary coroutines to compute the page table entry 243 ) = 
aaaaa = data-y.o\ 

i = aaaaa.h 29; /* the segment number */ 

aaaaa. h &= /* the address within segment i */ 

aaaaa = shiftjright {aaaaa , pagers , 1); /* the page address */ 

for {j = 0; aaaaa. I 7^ 0 V aaaaa.h 7^ 0; j++) { 

CO [2 * j].ctl-z.o.h = 0, CO [2 * j].ctl-z.o.l = {aaaaa. I Sz*3fi) <C 3; 
aaaaa = shift.right {aaaaa, 10, 1); 

} 

if {page.b[i + 1] < page_b[i] + j) /* address too large */ 

; /* nothing needs to be done, since data^b.o is zero */ 

else { 

if {j = 0) j — 1, co\G\.ctl-z.o = zero.octa-, 

{ Issue j pseudo-instructions to compute a page table entry 244 } ; 

} 

This code is used in section 237. 

244 . The first stage of coroutine Cj is co[2 * j]. It will pass the jth control block to 
the second stage, co [2 + j -|- 1] , which will load page table information from memory 
(or hopefully from the D-cache). 

( Issue j pseudo-instructions to compute a page table entry 244 ) = 

j--; 

aaaaa.l = page.r + pagej>[i] + j\ 

CO [2 * j].ctl-y.p = A; 

co[2 * j].ctl~>y.o = shiftjeft {aaaaa, 13); 

CO [2 * j].ctl->y.o.h -\-= signjtit', 
for ( ; ; j — ) { 

co[2 * j].ctl^x.o = zero-Octa\ co[2 * j].ctl^x. known = false, 

CO [2 * j].ctl^owner = &ico [2 * j]; 
startup {&i CO [2 * j], 1); 
if {j = 0) break; 

co[2 * {j — l)].ctl^y.p — &co[2 * j].ctl-x; 

} 

data^b.p = Szco[0].ctl~*x] 

This code is used in section 243. 
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245. At this point the translation of the given virtual address data-^.o is the 
octabyte data^b.o. Its least significant three bits are the protection code p = PrPwPx', 
its page address field is scaled by 2'*. It is entirely zero, including the protection bits, 
if there was a page table failure. 

The z field of the caller receives this translation. 

( Compute the new entry for c-’inbuf and give the caller a sneak preview 245 } = 
c^inbuf .tag = trans-key(data-^.o); 
c^mbuf ,data[0] = data->b.o; 
if (cc) { 

cc->ctl-’z.o — data^b.o; 
awaken {cc, 1); 

} 

This code is used in section 237. 



aaaaa: octa, §237. 
awaken = macro (), §125. 
b: spec, §44. 

c: register cache *, §237. 
cc: register coroutine *, 

§237. 

co: register coroutine 

§237. 

ctl: control *, §23. 
data: register control *, 
§124. 

data: octa *, §167. 
false = 0, §11. 



h: tetra, §17. 
i: register int, §12. 
inbuf: cacheblock, §167. 
j: register int, §12. 
known: bool, §40. 

1: tetra, §17. 
o: octa, §40. 

owner: coroutine *, §44. 
p: specnode *, §40. 
pagejy. int [], §238. 
page^r: int, §238. 
pagers: int, §238. 
shift Jeft: octa (), 



MMIX-ARITH §7. 
shiftjright: octa (), 
MMIX-ARITH §7. 
sign.bit = macro, §80. 
startup: static void (), §31. 
tag: octa, §167. 
trans^key = macro (), §240. 
x: specnode, §44. 
y: spec, §44. 

2 :: spec, §44. 
zero.octa: octa, 
MMIX-ARITH §4. 
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246. The write buffer. The dispatcher has arranged things so that speculative 
stores into memory are recorded in a doubly linked list leading upward from mem. 
When such instructions finally are committed, they enter the “write buffer,” which 
holds octabytes that are ready to be written into designated physical memory ad- 
dresses (or into the D-cache and/or S-cache). The “hot state” of the computation is 
reflected not only by the registers and caches but also by the instructions that are 
pending in the write buffer. 

(Type definitions ll) -|-= 
typedef struct { 

octa o; /* data to be stored */ 
octa addr\ /* its physical address */ 
tetra stamp-, /* when last committed (mod 2®^) */ 
internaLopcode i; /* is this write special? */ 
int size-, /* parameter for spec-write */ 

} write_node; 

247. We represent the buffer in the usual way as a circular list, with elements 
write-tail -I- 1, write-tail -1-2, . . . , write-head. 

The data will sit at least holding-time cycles before it leaves the write buffer. 
This speeds things up when different fields of the same octabyte are being stored by 
different instructions. 

( External variables 4 } -|-= 

Extern write_node *wbuf-bot, *wbuf-top-, 

/* least and greatest write buffer nodes */ 

Extern write_node * write-head, * write-tail-, 

/* front and rear of the write buffer */ 

Extern lockvar wbufJock; /* is the data in write-head being written? */ 

Extern int holding -time-, /* minimum holding time */ 

Extern lockvar speed-lock-, /* should we ignore holding-time! */ 

248. (Global variables 20 ) -l-= 

coroutine write-co; /* coroutine that empties the write buffer */ 
control write-ctT, /* its control block */ 

249. ( Initialize everything 22 ) -|-= 
write-co.ctl — k.write-ctT, 
write-CO .name = "Write"; 
write-CO .stage = write-from-wbuf -, 
write-ctl .ptr-a = (void *) &mem; 
write-ctl .go . 0.1 = 4; 

startup [k,write-CO , 1); 
write-head = write-tail = wbuf-top; 

250. ( Internal prototypes 13 ) 4-= 

static void print-write-bujfer ARGS((void)); 

251. ( Subrontines 14} -|-= 
static void print-write-buffer ( ) 

{ 



prmt/("Writeubuffer" ); 
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if {writejiead = writej,ail) prmi/("u(empty)\n" ); 
else { register write_node *p\ 

printf{" :\n"); 

for {p = write-head', p ^ write-taiT, p = {p= wbuf-bot ? wbuf-top : p — 1)) { 
printf {"mL")', print-Octa{p^addr); print/ ("]=" ); print-octaijp^o)-, 
if {p-d = stunc) print/ ("uunc"); 
else if {p-d = sync) print/ ("usync" ); 
prjnt/("u(ageu°/.d)\n" , ticks . I — p'^ stamp)', 

} 

} 

} 

252. The entire present state of the pipeline computation can be visualized by 
printing first the write buffer, then the reorder buffer, then the fetch buffer. This 
shows the progression of results from oldest to youngest, from sizzling hot to ice cold. 
( External prototypes 9 } += 

Extern void print-pipe ARGS((void)); 

253. ( External routines lo) += 
void print-pipe ( ) 

{ 

print-writc-buffer ( ) ; 
pnnt-reorder-bujfer ( ) ; 
pnnt-f etch-buff er { ); 

} 



ARCS = macro, §6. 
control = struct, §44. 
coroutine = struct, §23. 
ctl: control *, §23. 

Extern = macro, §4. 
go: specnode, §44. 
internal_opcode = enum, 
§49. 

1: tetra, §17. 

lockvar = coroutine *, §37. 



mem: specnode, §115. 

name: char *, §23. 

o: octa, §40. 

octa = struct, §17. 

print Jetch^buffer : static void 

0 , §73. 

print^octa: static void {), §19. 
print jreorder. buffer: static 
void ( ), §63. 

printf: int (), <stdio.h>. 



ptr.a: void *, §44. 
spec-write: extern void (), 
§208. 

stage: int, §23. 
startup: static void (), §31. 
stunc = 67, §49. 
sync = 79, §49. 

tetra = unsigned int, §17. 
ticks: Extern octa, §87. 
writejrom.wbuf =92, §129. 
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254. The writesearch routine looks to see if any instructions ahead of a given place 
in the mem list of the reorder buffer are storing into a given physical address, or if 
there’s a pending instruction in the write buffer for that address. If so, it returns a 
pointer to the value to be written. If not, it returns A. If the answer is currently 
unknown, because at least one possibly relevant physical address has not yet been 
computed, the subroutine returns the special code value DUNNO. 

The search starts at the x.up field of a control block for a store instruction, otherwise 
at the ptr_a field of the control block, unless ptr.a points to a committed instruction. 

The i field in the write buffer is usually st or pst , inherited from a store or partial 
store command. It may also be sync (from SYNC 1 or SYNC 3) or stunc (from STUNC). 
T^deflne DUNNO ((octa *) 1) /* an impossible non-A pointer */ 

{ Internal prototypes 13 ) += 

static octa *writesearch ARGS((control *,octa)); 

255. ( Subrontines 14} += 

static octa *writesearch{ctl, addr) 
control *ctl; 
octa addr; 

{ register specnode *p = {ctl^mem^x ? ctl-x.up : (specnode *) ctl^ptr.a); 
register write_node *q — writeJ,aiT, 

addr . I &= —8; 

if (p = &mem) goto qloop\ 

if (p > &ihot->x A ctl < hot) goto qloop; j* already committed */ 
if (p < Sictl-x A {ctl < hot Vp > &ihot->x)) goto qloop\ 
for ( ; p 7 ^ p = p^up) { 

if {p^addr.h = (tetra) —1) return DUNNO; 
if {{p^addr.l & —8) = addr. I Ap^addr .h = addr.h) 
return {pr>known ? Sz{p^o) : DUNNO); 

} 

qloop: for ( ; ; ) { 

if {q = writejiead) return A; 

if {q = wbuf-top) q = wbuf.bot', else g++; 

if (q-^addr.l = addr . I A q^addr.h = addr.h) return &((p^o); 

} 

} 

256. When we’re committing new data to memory, we can update an existing item 
in the write buffer if it has the same physical address, unless that item is already in 
the process of being written out. Increasing the value of holding_time will increase the 
chance that this economy is possible, but it will also increase the number of buffered 
items when writes are to different locations. 

A store instruction that sets any of the eight interrupt bits rwxnkbsp will not affect 
memory, even if it doesn’t cause an interrupt. 

When “store” is followed by “store uncached” at the same address, or vice versa, 
we believe the most recent hint. 

( Commit to memory if possible, otherwise break 256 ) = 

{ register write_node *q = writeHail ; 
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if {hot-interrupt & (F_BIT + *ff )) goto done.with.write; 
if {hot-x.addr .h *ffff0000) { 

if {hot-op > STB A hot-op < STSF) q-size = {hot-op & *f) 2; 

else if {hot-op > STSF A hot-op < STCO) q-size = 2; 
else q-size = 3; 

} 

if {hot-i 7 ^ sync) 
for ( ; ; ) { 

if {q = write-head) break; 
if {q=wbuf-top) q = wbuf-bot; else q++; 
if {q-i = sync) break; 

if {q-addr.l = hot-x.addr .1 A q-addr.h = hot-x.addr .h A {q ^ 
write-head V ^wbuf-lock)) goto addr-found-, 

} 

{ register write.node *p = {write-tail = wbuf-bot ? wbuf-top : write-tail — 1); 
if {p = write-head) break; /* the write buffer is full */ 
q = write-tail ; write-tail = p; 
q-addr = hot-x.addr-, 

} 

addr-found: q-o = hot-x .o-, 
q-stamp = ticks. t, 
q-i = hot-i; 

donc-with-write : spec-rem {&i{hot-x)); 
memslots ++; 

} 

This code is used in section 146. 



addr: octa, §246. 
addr: octa, §40. 

ARCS = macro, §6. 
control = struct, §44. 

F_BIT = 1 < 17, §54. 
h: tetra, §17. 
holding.time: int, §247. 
hot: control *, §60. 
i: internal.opcode, §246. 
i: internal.opcode, §44. 
interrupt: unsigned int, §44. 
known: bool, §40. 

1: tetra, §17. 

mem: specnode, §115. 

mem.slots: int, §86. 



mem.x: bool, §44. 
o: octa, §246. 
o: octa, §40. 
octa = struct, §17. 
op: mmix.opcode, §44. 
pst = 66, §49. 
ptr^a: void *, §44. 
size: int, §246. 

spec-rem: static void (), §97. 
specnode = struct, §40. 

st = 63, §49. 
stamp: tetra, §246. 

STB = #a0, §47. 

STCD = =^b4, §47. 

STSF = =^b0, §47. 



stunc = 67, §49. 
sync = 79, §49. 
tetra = unsigned int, §17. 
ticks: Extern octa, §87. 
up: specnode *, §40. 
wbuf.bot: write.node *, §247. 
wbufjock: lockvar, §247. 
wbuf.top: write.node *, §247. 
write Jiead: write.node *, 
§247. 

write.node = struct, §246. 
write.tail: write.node *, 
§247. 

x: specnode, §44. 
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257. A special coroutine whose duty is to empty the write buffer is always ac- 
tive. It holds the wbufJock while it is writing the contents of write-head. It holds 
Dcache-^ fill-lock while waiting for the D-cache to fill a block. 

(Cases for control of special coroutines 126} -|-= 
case write-frorri-wbuf : p = (cacheblock *) data->ptr-b\ 
switch {data-'state) { 

case 4: ( Forward the new data past the D-cache if it is write-through 263 ) ; 
data->state = 5; 

case 5: if [write-head = wbuf-bot) write-head = wbuf-top-, else write-head — ; 
write-restart', data-state = 0 ; 

case 0: if [self-lockloc) *{self-lockloc) = A.,self-lockloc = A; 

if [write-head = write-tail) wait[l); /* write buffer is empty */ 
if [write-head-d = sync) (Ignore the item in writc-head 264); 
if [write-head->addr .h k,*flff0000) goto mem-direct', 
if [ticks. I — write-head^ stamp < holding-time A -<speed-lock) wait[l); 

/* data too raw */ 

if [-iDcache) goto mem-direct', /* not cached */ 

if [DcacheHock V [j = get-reader [Dcache) < 0)) waitfi)', /* D-cache busy */ 
startup [&iDcache^reader [j] , Dcache->access-time ) ; 

(Write the data into the D-cache and set state = 4, if there’s a cache hit 262 ); 
data-state = [[Dcache->mode & WRITE_ALLQC) A writc-head-H 7 ^ stunc ? 1 : 3); 
wait [Dcache-access-time ) ; 

case 1: ( Try to put the contents of location write-head^ addr into the D-cache 261 ); 
data-'state = 2 ; sleep; 

case 2: data-state = 0; sleep; /* wake up when the D-cache has the block */ 
case 3: (Handle write-around when writing to the D-cache 259 ); 
mem-direct : ( Write directly from write-head to memory 260 ) ; 

} 

258. ( Local variables 12 ) += 
register cacheblock *p, *q; 

259. The granularity is guaranteed to be 8 in write-around mode (see MMIX-config ) . 
Although an uncached store will not be stored in the D-cache (unless it hits in the 
D-cache), it will go into a secondary cache. 

( Handle write-around when writing to the D-cache 259 ) = 
if [Dcache-^flusher .next) wait[l); 

Dcache-^outbuf .tag .h = write-head- addr .h; 

Dcache-outbuf .tag .1 = write-head- addr . I & [— Dcache-bb); 
for [j = 0; j < Dcache-bb Dcache-g; j++) Dcache-outbuf .dirty [j] = false; 
Dcache-outbuf .data\[write-head- addr . I & [Dcache-bb — 1)) 3] = writc-head-o; 

Dcache-outbuf .dirty [[writc-head-addr . I & [Dcache-bb — 1)) Dcache-g] = true; 

set-lock [self , wbufJock); 

startup [SiDcache-flusher , Dcache-copy-out-time); 
data-state = 5; wait[Dcache-copy-Out-time); 

This code is used in section 257. 
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260. ( Write directly from writejiead to memory 260 ) = 
if [memdock) wait{l)\ 

setJock{self , wbufJock)] 

setJock{SzmemJocker , mem Jock)-, /* a coroutine of type */ 

startup {SzmemJocker , mem.addrJime + mem.writeJime); 
if {write_head~^addr .h Iff f 0000) 

spec.write{write.head^addr , write.head~*o, writeJiead-*size)-, 
else mem.write ( write Jiead-*addr , writeJiead~*o ) ; 
data~*state = 5; wait (mem.addr dime + memjwriteJime)-, 

This code is used in section 257. 

261. A subtlety needs to be mentioned here: While we’re trying to update the D- 
cache, another instruction might be filling the same cache block (although not because 
of the same physical address). Therefore we goto write-restart here instead of saying 
wait{l). 

(Try to put the contents of location write-head^ addr into the D-cache 261 ) = 
if [Dcache^ filler .next) goto write-restart’, 

if {{Scache A Scache^lock) V {^Scache A memJock)) goto write-restart’, 
p = alloc-slot{D cache, write-head-^addr); 
if (-ip) goto write-restart] 

if (Scache) set-lock(^Dcache-^filler,Scache~*lock) 
else set-lock (^D cache-* filler , memJock ) ; 
set-lock (self , Dcache~^ fill-lock)] 
data^ptr-b = Dcache^filler-ctl .ptr-b = (void *) p; 

D caches filler-ctl.z.o = writ e-head-* addr] 

startup (SzDcache-*filler , Scache ? Scache-*access-time : mem-addr-time)] 

This code is used in section 257. 



access-time: int, §167. 
addr: octa, §246. 
alloc-slot: static cacheblock 

*0, §205. 
bb: int, §167. 

cacheblock = struct, §167. 
copy-Out-time: int, §167. 
data: register control *, 
§124. 

data: octa *, §167. 

Dcache: cache *, §168. 
dirty: char *, §167. 
false = 0, §11. 
fill-lock: lockvar, §167. 
filler: coroutine, §167. 
filler-ctl: control, §167. 
flusher: coroutine, §167. 
g: int, §167. 

get-reader: static int (), §183. 
h: tetra, §17. 
holding-time: int, §247. 
i: internal.opcode, §246. 
j: register int, §12. 

1: tetra, §17. 



lock: lockvar, §167. 
lockloc: coroutine §23. 
mem-addr-time: int, §214. 
mem-lock: lockvar, §214. 
mem-locker: coroutine, §127. 
mem-write: void (), §213. 
mem-write-time: int, §214. 
MMIX-Config: void (), 
MMIX-CONFIG §38. 
mode: int, §167. 
next: coroutine *, §23. 
o: octa, §246. 
o: octa, §40. 

outbuf: cacheblock, §167. 
p: register write.node *, 

§256. 

ptr-b: void *, §44. 
reader: coroutine *, §167. 
Scache: cache *, §168. 
self: register coroutine *, 
§124. 

set-lock = macro (), §37. 
size: int, §246. 
sleep = macro, §125. 



spec-write: extern void (), 
§208. 

speed-lock: lockvar, §247. 
stamp: tetra, §246. 
startup: static void (), §31. 
state: int, §44. 
stunc = 67, §49. 
sync = 79, §49. 
tag: octa, §167. 
ticks: Extern octa, §87. 
true = 1, §11. 
vanish = 9S, §129. 
wait = macro (), §125. 
wbuf-bot: write.node *, §247. 
wbuf-lock: lockvar, §247. 
wbuf-top: write.node *, §247. 
WRITE. ALLOC =2, §166. 
write-from-wbuf =92, §129. 
write-head: write.node *, 
§247. 

write-tail: write.node *, 
§247. 

spec, §44. 
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262. Here it is assumed that Dcache-access-time is enough to search the D-cache 
and update one octabyte in case of a hit. The D-cache is not locked, since other 
coroutines that might be simultaneously reading the D-cache are not going to use the 
octabyte that changes. Perhaps the simulator is being too lenient here. 

( Write the data into the D-cache and set state = 4, if there’s a cache hit 262 ) = 
p — cache.search{D cache, writeJiead^addr)-, 

if (P) { 

p = use.and.fix{Dcache ,p)-, 
set Jock (self , wbufjock); 
data-ptr.b — (void *) p\ 

pr*data\(writejiead->addr.l & (Dcache-bb — 1)) 3] = writeJicad-*o\ 

dirty [(write Jiead^ addr . I & (Deache^bb — 1)) Deache^g] = true', 
data-’state = 4; wait(Deache-’accessJime)', 

} 

This code is used in section 257. 

263. ( Forward the new data past the D-cache if it is write-through 263 } = 

if ((Dcac/ie-'mode & WRITE_BACK) = 0) { /* write-through */ 

if (D caches flusher .next) wait(l)-, 
flush.cache(Dcache ,p, true)', 

} 

This code is used in section 257. 

264. ( Ignore the item in write.head 264 ) = 

{ 

set Jock (self , wbufjock); 
data->state = 5; 
wait(l); 

} 

This code is used in section 257. 
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access.time: int, §167. 
addr: octa, §246. 
hh\ int, §167. 
cache^search: static 
cacheblock *(), §193. 
data: octa *, §167. 
data: register control *, 
§124. 

Dcache: cache *, §168. 
dirty: char *, §167. 
flush^cache: static void (), 



§203. 

flusher: coroutine, §167. 
g: int, §167. 

1 : tetra, §17. 
mode: int, §167. 
next: coroutine *, §23. 
o: octa, §246. 
p: register cacheblock *, 
§258. 

ptrj): void *, §44. 

self: register coroutine *, 



§124. 

set Jock = macro (), §37. 
state: int, §44. 
true = 1, §11. 
use.and.fix: static 
cacheblock *(), §196. 
wait =macro (), §125. 
wbufjock: lockvar, §247. 
WRITE_BACK = 1, §166. 
write.head: write.node *, 
§247. 
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265. Loading and storing. A RISC machine is often said to have a “load/store 
architecture,” perhaps because loading and storing are among the most difficult things 
a RISC machine is called upon to do. 

We want memory accesses to be efficient, so we try to access the D-cache at the 
same time as we are translating a virtual address via the DT-cache. Usually we hit in 
both caches, but numerous cases must be dealt with when we miss. Is there an elegant 
way to handle all the contingencies? Alas, the author of this program was unable to 
think of anything better than to throw lots of code at the problem — knowing full 
well that such a spaghetti-like approach is fraught with possibilities for error. 

Instructions like LDO x, y, z operate in two pipeline stages. The first stage com- 
putes the virtual address y -\- waiting if necessary until y and z are both known; 
then it starts to access the necessary caches. In the second stage we ascertain the 
corresponding physical address and hopefully find the data in the cache (or in the 
speculative mem list or the write buffer). 

An instruction like STB x, y, z shares some of the computation of LDO x, y, z, because 
only one byte is being stored but the other seven bytes must be found in the cache. 
In this case, however, x is treated as an input, and mem is the output. The second 
stage of a store command can begin even though x is not known during the first stage. 
Here’s what we do at the beginning of stage 1. 

^define Id.stJaunch 7 

/* state when load/store command has its memory address */ 

{ Cases to compute the virtual address of a memory operation 265 ) = 
case preld: case prest: case prego- 

data^z.o = incr{data~^z.o, data^xx & —{data-i = prego ? Icache : Dcache)->bb); 

/* (I hope the adder is fast enough) */ 

case Id: case Idunc: case Idvts: case st: case pst: case synod: case syncid: 
startjd.st: data-*y.o = oplus{data^y.o, data^z.o)', 
data-state = Id.stJaunch ; goto switchl ; 
case Idptp: case Idpte: if (data^.o.h) goto startjd.st-, 

data^x.o = zero-octa-, data-x. known = true-, goto die-, /* page table fault */ 

This code is used in section 132. 

266. #define PRW_BITS {data-i < st 7 PR_BIT : data-i = pst 7 PR_BIT + PW_BIT : 

(data-n = syncid A {dataNoc.h & sign.bit)) 7 0 : PW_BIT) 

( Special cases for states in the first stage 266 ) = 

case Id.stJaunch: if {{self -I- l)-next) wait{l)-, /* second stage must be clear */ 
(Handle special cases for operations like prego and Idvts 289); 

if {data-nj.o.h &L signj)it) (Do load/store stage 1 with known physical address 271 ); 
if {pagejiad) { 

if {datam < preld V data-i = st y data-i = pst) datamnterrupt |= PRW_BITS; 
goto fimex-, 

} 

if {DTcacheHock V {j = getjreader {DT cache)) < 0) wait{l)-, 
startup {SzDT cache -^reader [j], DT cache~*accessDime ) ; 

( Look up the address in the DT-cache, and also in the D-cache if possible 267 ) ; 
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pass-after(DTcache-‘access-time); goto passit; 
See also sections 310, 326, 360, and 363. 

This code is used in section 130. 



267 . When stage 2 of a load/store command begins, the state will depend on what 
transpired in stage 1. For example, data-*state will be DTjmiss if the virtual address 
key can’t be found in the DT-cache; then stage 2 will have to compute the physical 
address the hard way. 

The data^state will be DTJiit if the physical address is known via the DT-cache, 
but the data may or may not be in the D-cache. The data~*state will be hit_and-miss 
if the DT-cache hits and the D-cache doesn’t. And data~*state will be Id-ready if 
data-^x.o is the desired octabyte (for example, if both caches hit). 



T^tdefine 


D Tjmiss 


10 


:^define 


D TJiit 


11 


95^:define 


hiLandjmiss 


^define 


Id.ready 


13 


95^:define 


st.ready 


14 


T^tdefine 


prestjwin 


15 



/* second stage state when DT-cache doesn’t hold the key */ 
/* second stage state when physical address is known */ 

12 /* second stage state when D-cache misses */ 

/* second stage state when data has been read */ 

/* second stage state when data needn’t be read */ 

/* second stage state when we can fill a block with zeroes */ 



( Look up the address in the DT-cache, and also in the D-cache if possible 267 ) = 
p = cachesearch{DTcache ^ trans-key {data~Oj .o))\ 

if D cache V Dcache^lockV {j = get-reader {D cache)) < 0V(daia^ > stAdata~d < syncid)) 

(Do load/store stage 1 without D-cache lookup 270 ); 
startup {S^Dcache~^reader[j], Dcache^access-time)\ 
if (p) ( Do a simultaneous lookup in the D-cache 268 ) 
else data-*state = DT-miss; 



This code is used in section 266. 



access-time: int, §167. 
bh: int, §167. 
cachesearch: static 
cacheblock *(), §193. 
data: register control *, 
§124. 

Dcache: cache *, §168. 
die: label, §144. 

DTcache: cache *, §168. 
fin.ex: label, §144. 
get-reader: static int {), §183. 
h: tetra, §17. 
i: internal.opcode, §44. 
Icache: cache *, §168. 
incr: octa (), mmix-ARITH §6. 
interrupt: unsigned int, §44. 
j: register int, §12. 
known: bool, §40. 

Id =56, §49. 

Idpte = 58, §49. 



Idptp = 57, §49. 

Idunc = 59, §49. 

Idvts = 60, §49. 
loc: octa, §44. 
lock: lockvar, §167. 
mem: specnode, §115. 
next: coroutine *, §23. 
o: octa, §40. 

Oplus: octa (), MMIX-ARITH §5. 

p: register cacheblock *, 

§258. 

page-bad: bool, §238. 
pass-after = macro ( ), §125. 
passit: label, §134. 

PR_BIT = 1 < 7, §54. 
prego = 73, §49. 
preld = 61, §49. 
prest = 62, §49. 
pst = 66, §49. 

PW_BIT = 1 < 6, §54. 



reader: coroutine *, §167. 
self: register coroutine *, 
§124. 

sign-bit = macro, §80. 
st = 63, §49. 

startup: static void (), §31. 

state: int, §44. 

switchl : label, §130. 

syncd = 64, §49. 

syncid = 65, §49. 

trans-key = macro (), §240. 

true = 1, §11. 

wait = macro (), §125. 

x: specnode, §44. 

XX : unsigned char, §44. 
y: spec, §44. 

2 :: spec, §44. 
zero-octa: octa, 
MMIX-ARITH §4. 



MMIX-PIPE: LOADING AND STORING 



270 



268. We assume that it is possible to look up a virtual address in the DT-cache 
at the same time as we look for a corresponding physical address in the D-cache, 
provided that the lower b + c bits of the two addresses are the same. (They will 
always be the same if 6 + c < pages ; otherwise the operating system can try to make 
them the same by “page coloring” whenever possible.) If both caches hit, the physical 
address is known in max.{DTcache->access-time, Dcache^access-time) cycles. 

If the lower b + c bits of the virtual and physical addresses differ, the machine will 
not know this until the DT-cache has hit. Therefore we simulate the operation of 
accessing the D-cache, but we go to DTJiit instead of to hitsnd^miss because the 
D-cache will experience a spurious miss. 

#deflne max{x,y) ((x) < (y) ? (y) : (x)) 

{ Do a simultaneous lookup in the D-cache 268 ) = 

{ octa *m; 

p = use.and.fix {DTcache , p) , data-^z.o = pr>data[0]-, 

(Check the protection bits and get the physical address 269); 

m = writesearch{data, data-z.o)\ 

if (m = DUNNO) data->state — DTJiit \ 

else if (m) data-x.o — *m, datastate = Idseady, 

else if {Dcache-b -\- Dcache-c > pages A 

{{data-y.o.l © data-z.o.l) & {{Dcache-^bb <C Dcache-^c) — (1 <C pages)))) 
data-state = DT.hit-, /* spurious D-cache lookup */ 
else { 

q = cachesearch{D cache, data^z.o)\ 

if (a) { 

if (data-d = Idunc) q — demotesnd.fix{D cache, q)', 
else q = us esnd^fix{D cache, q)', 

data-x.o = cydata[{data-‘z.o.l & {Dcache-*bb — 1)) 3> 3]; 
data~>state = Idseady, 

} else data^state = hitsndjmiss', 

} 

pass-after ( max {DTcache^accessJime , Dcache^accessJime ) ) ; 
goto passit; 

} 

This code is used in section 267. 
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269. The protection bits PrPwPx in a translation cache are shifted four positions 
right from the interrupt codes PR_BIT, PW_BIT, PX_BIT. If the data is protected, we 
abort the load/store operation immediately; this protects the privacy of other users. 
( Check the protection bits and get the physical address 269 ) = 

if (data-stack^alert) { 

if {data^z.o.l & (PW_BIT PR0T_0FFSET)) data-stack.alert = false; 

else data-'z.o = g[rC\.o\ /* use the continuation page for stack overflow */ 

} 

j = PRW_BITS; 

if {{{data-z.o.l < PR0T_0FFSET) & j) / j) { 

if {dataH = synod V dataN, = syncid) goto sync-check; 
if {data~d ^ preld A dataH yf prest) 

data-interrupt |= j & ^{data-z.o.l <C PR0T_0FFSET); 
data-‘stack.alert = false; 
goto fin.ex; 

} 

data-’z.o = phys-addr{data-yj.o, data-z.o); 

This code is used in sections 268, 270, and 272. 

270. (Do load/store stage 1 without D-cache lookup 270 ) = 

{ octa *m; 

if (P) { 

p = use^and^fix {DTcache , p) , data-'z.o = pr^data^]; 

(Check the protection bits and get the physical address 269); 
if {data^i > st A data-i < syncid) data-*state = stjready; 
else { 

m = writo-search {data , data^z.o); 

if (m A m 7 ^ DUNNO) data-'x.o — *m, data^state = Idjready; 
else data-state = DTMt; 

} 

} else data-*state = DTjmiss; 
pass-after{DTcache->access-time); goto passit; 

} 

This code is used in section 267. 



access^time: int, §167. 
b: int, §167. 
bb: int, §167. 
c: int, §167. 
cache.search: static 
cacheblock *(), §193. 
data: octa *, §167. 
data: register control *, 
§124. 

Dcache: cache *, §168. 
demote.and.fix: static 
cacheblock *(), §199. 
DT.hit = 11, §267. 
DT.miss = 10, §267. 
DTcache: cache *, §168. 
DUNNO = macro, §254. 
false = 0, §11. 
fin.ex: label, §144. 
g: int, §167. 
hit-and-miss =12, §267. 



i: internaLopcode, §44. 
interrupt: unsigned int, §44. 
j: register int, §12. 

1: tetra, §17. 

Id-ready = 13, §267. 

Idunc = 59, §49. 
o: octa, §40. 
octa = struct, §17. 
p: register cacheblock *, 
§258. 

pages: int, §238. 
pass-after = macro (), §125. 
passit: label, §134. 
phys-addr: static octa (), 
§241. 

PR_BIT = 1 < 7, §54. 
preld = 61, §49. 
prest = 62, §49. 

PR0T_DFFSET = 5, §54. 
PRW_BITS = macro, §266. 



PW_BIT = 1 < 6, §54. 

PX_BIT = 1 < 5, §54. 
q: register cacheblock *, 

§258. 

rC=8, §52. 
st = 63, §49. 
st-ready = 14, §267. 
stack-alert: bool, §44. 
state: int, §44. 
sync-check: label, §370. 
synod = 64, §49. 
syncid = 65, §49. 
use-and-fix: static 
cacheblock *(), §196. 
writesearch: static octa *(), 
§255. 

x: specnode, §44. 
y: spec, §44. 

2 : spec, §44. 
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271. (Do load/store stage 1 with known physical address 271 ) = 

{ octa *m; 

if {^{data->loc.h sign.bit)) { 

if {data-’i = synod V data-i = syncid) goto sync-check-, 
if {data-’i ^ preld A data-d ^ prest) data-interrupt \ — N_BIT; 
goto fin.ex; 

} 

data-’z.o — data-nj.o\ data->z.o.h —= sign.bit; 
if {data-z.o.h &c*fffi0000) { 
switch (data-’i) { 

case Idvts: case preld- case prest-. case prego: case synod: case syncid: 
goto fin-ex-, 

case Id: case Idune: if {mem-lock) niait{l)-, 
if {data^op < LDSF) i = {data^op &*f) 2; 

else if {data- op < CSWAP) i = 2; 
else i = 3; 

data-x.o = spec-read {data-z.o, i); 
goto makc-ld-ready -, 
case pst: 

if {{data- op ©CSWAP) < 1) { 

data-x.o = spec-read{data-z. 0,3)-, goto makeJd-ready -, 

} 

data-x.o = zero-octa-, 

case st: data- state = st-ready-, pass-after{l)-, goto passit-, 

} 

} else if {data-i > st A data-i < syncid) { 

data- state = st-ready-, pass-after{l)-, goto passit; 

} 

m = write-search{data, data-z.o); 
if (m) { 

if (m = DUNND) data-state = DT-hit; 
else data-x.o = *m, data-state = Id-ready; 
pass-after{l); goto passit; 

} else if {-iDcache) { 
if {memJock) wait{l); 
data-x.o — mem-read {data-z.o); 
makeJd-ready : set-lock {&zmem-locker , mem-lock); 
data-state = Id-ready; 

startup {&imemJocker , mem-addr-time + mem-read-time); 
pass-after{mem-addr-time + mem-read-time); goto passit; 

} 

if {Dcache-lock V {j — get-reader {D cache)) < 0) { 
data-state = DT-hit; pass-after{l); goto passit; 

} 

startup {SzD cache-reader [j] , Dcache-access-time ) ; 
q — caches earch{D cache, data-z.o); 

if (<?) { 

if {data-i = Idunc) q = demote-and- fix {D cache, q); 
else q = us e-and-fix{D cache, q); 
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data-*x.o = q-‘data[{data-*z .o.l & {Dcache-'bb — 1)) 3> 3]; 
data-'state = Idjready, 

} else data-'state = hit.and.miss \ 
pass-after{Dcache~access-time); goto passit\ 

} 

This code is used in section 266. 



access.time: int, §167. 
bb: int, §167. 
cache^search: static 
cacheblock *(), §193. 

CSWAP = =^94, §47. 
data: octa *, §167. 
data: register control *, 
§124. 

Dcache: cache *, §168. 
demote.and^fix: static 
cacheblock *(), §199. 
DT.hit = ll, §267. 

DUNND = macro, §254. 
fin.ex: label, §144. 
get^reader: static int (), §183. 
h: tetra, §17. 
hit-and-miss =12, §267. 
i: internal.opcode, §44. 
interrupt: unsigned int, §44. 
j: register int, §12. 

1: tetra, §17. 

Id =56, §49. 

Id-ready = 13, §267. 



LDSF = =^90, §47. 

Idunc = 59, §49. 

Idvts = 60, §49. 

loc: octa, §44. 

lock: lockvar, §167. 

mem-addr-time: int, §214. 

mem-lock: lockvar, §214. 

mem-locker: coroutine, §127. 

mem-read: octa (), §210. 

mem-read-time: int, §214. 

N_BIT = 1 < 4, §54. 

o: octa, §40. 

octa = struct, §17. 

op: mmix.opcode, §44. 

pass-after = macro ( ), §125. 

passit: label, §134. 

prego = 73, §49. 

preld = 61, §49. 

prest = 62, §49. 

pst = 66, §49. 

q: register cacheblock *, 
§258. 



reader: coroutine *, §167. 
set-lock =mOiCYO (), §37. 
sign-bit = macro, §80. 
spec-read: extern octa (), 
§208. 

st = 63, §49. 
st-ready = 14, §267. 
startup: static void (), §31. 
state: int, §44. 
sync-check: label, §370. 
syncd = 64, §49. 
syncid = 65, §49. 
use-and-fix: static 
cacheblock +(), §196. 
wait = macro (), §125. 
writesearch: static octa *(), 
§255. 

x: specnode, §44. 
y: spec, §44. 

2 :: spec, §44. 
zero-octa: octa, 

MMIX-ARITH §4. 
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272. The program for the second stage is, likewise, rather long-winded, yet quite 
similar to the cache manipulations we have already seen several times. 

Several instructions might be trying to fill the DT-cache for the same page. (A sim- 
ilar situation faced us in the write-from.wbuf coroutine.) The second stage therefore 
needs to do some translation cache searching just as the first stage did. In this stage, 
however, we don’t go all out for speed, because DT-cache misses are rare. 

^define DT.retry 8 

/* second stage state when DT-cache should be searched again */ 

^^^deflne got.DT 9 

/* second stage state when DT-cache entry has been computed */ 

{ Special cases for states in later stages 272 ) = 
square.one: data-'state = DTjretry, 

case DT.retry: if {DTcache-lock V {j = geLreader [DT cache)) < 0) wait{l)-, 
startup {&iDT cache^reader [j\, DT cache~>access-time ) ; 
p — cache_search{DTcache , trans-key {data->y.o))', 
if (P) { 

p = use.and.fix {DTcache , p) , data->z.o = pr>data[0]-, 

(Check the protection bits and get the physical address 269); 
if (data-d > st A data-d < syncid) data->state = stjready, 
else data^ state = DTJiit-, 

} else data-^state = DT.miss; 
wait{DTcache^access-time ) ; 
case DTjmiss: if [DTcache-^ filler .next) 

if (data-d = preld V data-i = prest) goto fin.ex; else goto square.one', 
if {no.hardware.PT V page.f) 

if {dataH = preld V data-d = prest) goto fin.ex; else goto emulate.virt; 
p — alloc.slot{DTcache,trans.key{data-*y.o))-, 
if {-'p) goto square.one; 

data~*ptr.b = DTcache- filler. ctl.ptr.b = (void *) p\ 

DTcache- filler. ctl .y.o — data->y.o\ 
set.lock{self , DTcache-fill.lock)-, 
startup {^cDTcache-" filler , 1); 
data^state = got.DT ; 

if {data^ = preld y data-d = prest) goto fin.ex; else sleep-, 
case got.DT-. release.lock{self , DTcache-fillJock)-, 

{ Check the protection bits and get the physical address 269 ) ; 
if {data-d > st A data-d < syncid) goto finish.store-, 

/* otherwise we fall through to Id.retry below */ 

See also sections 273, 276, 279, 280, 299, 311, 354, 364, and 370. 

This code is used in section 135. 
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273. The second stage might also want to fill the D-cache (and perhaps the S-cache) 
as we get the data. 

Several load instructions might be trying to fill the same cache block. So we 
should go back and look in the D-cache again if we miss and cannot allocate a slot 
immediately. 

A PRELD or PREST instruction, which is just a “hint,” doesn’t do anything more if 
the caches are already busy. 

( Special cases for states in later stages 272 } -|-= 

Id.retry: data-state = DTMt\ 

case DTJiit- if {data-d = preld V data-i = prest) goto fin.ex-, 

{ Check for a hit in pending writes 278 ); 
if {{data-z.o.h & *ffff0000) V -iDcache) 

(Do load/store stage 2 without D-cache lookup 277); 
if [Dcache^lock V {j = getjreader [D cache)) < 0) wait{l)-, 
startup [&:Dcache~'reader[j], Dcache-*access-time)\ 
q = cache.search {D cache , data-^z.o)-, 

if (?) { 

if [data-n = Idunc) q = demote.and.fix{D cache ^q)\ 
else q = use.and.fix{Dcache,q)', 

data~*x.o = q-^data[{data^z.o.l & {Dcache-*bb — 1)) ^ 3]; 
data~*state = ldjready\ 

} else data-state = hit.and.miss ; 
wait ( D cache- access.time ) ; 

case hit.and.miss: if {data-i = Idunc) goto avoid.D] 

(Try to get the contents of location data-z.o in the D-cache 274); 



access-time: int, §167. 
alloc.slot: static cacheblock 

*0, §205. 

avoid^D: label, §277. 
bb: int, §167. 
cache^search: static 
cacheblock *(), §193. 
data: octa *, §167. 
data: register control *, 
§124. 

Dcache: cache *, §168. 
demote^and^fix: static 
cacheblock *(), §199. 
DT.hit = ll, §267. 

DT.miss = 10, §267. 

DTcache: cache *, §168. 
emulate^virt: label, §310. 
filLlock: lockvar, §167. 
filler: coroutine, §167. 
filler^ctl: control, §167. 
fin.ex: label, §144. 



finish.store: label, §280. 
get^reader: static int (), §183. 
h: tetra, §17. 
hit-and-miss = 12, §267. 
i: internal.opcode, §44. 
j: register int, §12. 

1: tetra, §17. 

Id-ready = 13, §267. 

Idunc = 59, §49. 
lock: lockvar, §167. 
next: coroutine *, §23. 
no-hardware-PT : bool, §242. 
o: octa, §40. 

p: register cacheblock *, 

§258. 

page-f: int, §238. 

preld = 61, §49. 

prest = 62, §49. 

ptr-b: void *, §44. 

q: register cacheblock *, 



§258. 

reader: coroutine *, §167. 
release-lock = macro ( ), §37. 
self: register coroutine *, 
§124. 

set-lock = macro (), §37. 
sleep = macro, §125. 
st = 63, §49. 
st-ready = 14, §267. 
startup: static void (), §31. 
state: int, §44. 
syncid = 65, §49. 
trans-key = macro (), §240. 
use-and-fix: static 
cacheblock +(), §196. 
wait = macro (), §125. 
write-from-wbuf =92, §129. 
x: specnode, §44. 
y: spec, §44. 
spec, §44. 
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274. (Try to get the contents of location data-z.o in the D-cache 274) = 

( Check for prest with a fully spanned cache block 275 }; 

if {Dcache-^filler .next) goto ldjretry\ 

if {{Scache A Scache^lock) V (-iScache A memJock)) goto Id.retry, 
q — alloc^slot[D cache, data->z.o)-, 
if (-'(/) goto Id.retry; 

if {Scache) setJock{&iDcache->filler,Scache-*lock) 
else setJock{SzDcache-fiUer , memJock)-, 
setJock{self , Dcache-’fillJock)-, 
data^ptrJ) = Dcache^filler.ctl .ptr.b = (void *) q\ 

Dcache-'filler.ctl .z.o = data->z.o\ 

startup {&iDcache-’ filler , Scache ? Scache-'access-time : mem.addr.timc)-, 
data-state = Idjready, 

if {data-i = preld V data^i = prest) goto fin.ex; else sleep-, 

This code is used in section 273. 

275. If a prest instruction makes it to the hot seat, we have been assured by the user 
of PREST that the current values of bytes in virtual addresses data~qj.o — {data-xx & 
—Dcache~‘bb) through data^y.o + {data-xx & {Dcache-bb — 1)) are irrelevant. Hence 
we can pretend that we know they are zero. This is advantageous if it saves us from 
filling a cache block from the S-cache or from memory. 

( Check for prest with a fully spanned cache block 275 ) = 
if {data-i = prest A 

{data-xx > Dcache-bb V {{data-y.o.l & {Dcache-bb — 1)) = 0)) A 
{{data-y.o.l + {data-xx & {Dcache-bb — 1)) + 1) 0 data-y.o.l) > Dcache-bb) 
goto prest.span-. 

This code is used in section 274. 

276. (Special cases for states in later stages 272 ) += 
presLspan-. data-state = prest.win-, 

case prest.win-. if {data yf oldjiot V Dlocker .next) wait{l)-, 
if {Dcache-lock) goto fin.ex; 

q — alloc.slot {D cache , data-z.o)-, /* OK if Dcache-filler is busy */ 

if ('?) { 

cleanMock ( D cache , q ) ; 

q-tag = data-z.o-, q-tag.l &= —Dcache-bb-, 
setJock{k.Dlocker , Dcache-lock)-, 
startup {&cDlocker , D cache- copy -inMme)-, 

} 

goto fin.ex; 

111 . (Do load/store stage 2 without D-cache lookup 277) = 

{ 

avoid.D: if {memJock) wait{l)-, 
set Jock {&imemJocker , memJock)-, 

startup {SzmemJocker , mem^addrjime + mem.readjime); 
data-x.o — mem^read {data-z.o)-, 

data-state = Id.ready; wait{mem.addrjime + mem.readjime); 

} 

This code is used in section 273. 
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278. ( Check for a hit in pending writes 278 ) = 

{ 

octa *m = write.search{data, data-z.o); 
if (m = DUNNO) wait{l); 
if (m) { 

data-'x.o = *m; 
datastate = Idjready, 
wait(l)-, 

} 

} 

This code is used in section 273. 



access.time: int, §167. 
alloc.slot: static cacheblock 

*0, §205. 
bb: int, §167. 

clean.block: void (), §179. 
copy.in^time-. int, §167. 
data: register control *, 
§124. 

Dcache: cache *, §168. 
Dlocker: coroutine, §127. 
DUNND = macro, §254. 
filLlock: lockvar, §167. 
filler: coroutine, §167. 
filler^ctl: control, §167. 
fin.ex: label, §144. 
i: internal.opcode, §44. 

1 : tetra, §17. 



Idjready = 13, §267. 

Id^retry: label, §273. 
lock: lockvar, §167. 
mem.addr.time: int, §214. 
memdock: lockvar, §214. 
memdocker: coroutine, §127. 
mem.read: octa (), §210. 
mem.readdime: int, §214. 
next: coroutine *, §23. 
o: octa, §40. 
octa = struct, §17. 
olddiot: control *, §60. 
preld = 61, §49. 
prest = 62, §49. 
prest.win = 15, §267. 
ptrd): void *, §44. 
q: register cacheblock *, 



§258. 

Scache: cache *, §168. 
self: register coroutine *, 
§124. 

setdock = macro (), §37. 
sleep = macro, §125. 
startup: static void (), §31. 
state: int, §44. 
tag: octa, §167. 
wait = macro (), §125. 
write.search: static octa *(), 
§255. 

x: specnode, §44. 

XX : unsigned char, §44. 
y: spec, §44. 
spec, §44. 
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279. The requested octabyte will arrive sooner or later in data-x.o. Then a load 
instruction is almost done, except that we might need to massage the input a little 
bit. 

( Special cases for states in later stages 272 ) += 

case Idjready: if (self->lockloc) *{self->lockloc) =A,self^lockloc = A; 
if {data-n > st) goto finish.store; 
switch {data-op 1) { 

case LDB 1: case LDBU 1: j ~ {data-z.o.l & *7) 3, i = 56; goto finjd- 

case LDW 1: case LDWU 1: j = {data-z.o.l & *6) <C 3; i = 48; goto finJd- 

case LDT 1: case LDTU 1: j = {data-z.o.l & *4) ^ 3; i = 32; 

finjd: data-x.o = shif tjright {shift Jeft {data-x. o, j),i, data-op & *2); 
default: goto fin^ex\ 

case LDHT ^ 1: if {data-z.o.l & 4) data-x.o.h = data-x.o.l; 
data-x. o.l — 0; goto fin.ex', 

case LDSF 1: if {data-z.o.l & 4) data-x.o.h = data-x.o.l] 
if {{data-x.o.h &z*7i800000) = 0 A {data-x.o.h &i*7f if if)) { 
data-x.o — load.sf {data-x.o.h)-, 
data-state = 3; wait{deninjpenalty)-, 

} 

else data-x.o = load.sf {data-x.o.h)-, goto fin.ex-, 
case LDPTP ^ 1: if {{data-x.o.h & signj>it) = 0 V {data-x. o.l & *lff8) pagein) 
data-x.o = zero-octa; 
else data-x.o.l &= —(1 13); 

goto fin^ex; 

case LDPTE 3> 1: if {{data-x.o.l & *lff8) ^ pagein) data-x.o = zero.octa-, 
else data-x.o = incr{oandn{data-x.o,page.mask), data-x.o.l & *7); 
data-x.o.h &= goto fin.ex-, 

case UNSAVE 3> 1: (Handle an internal UNSAVE when it’s time to load 336); 

} 

280. ( Special cases for states in later stages 272 ) += 
finish.store: data-state = stjready-, 

case stjready: switch {data-i) { 

case st: case pst: (Finish a store command 281 ); 

case syncd: data-b.o.l = {Dcache ? Dcache-bb : 8192); goto do.syncd-, 

case syncid: data-b.o.l = {[cache ? Icache-bb : 8192); 

if {Dcache A Dcache-bb < data-b.o.l) data-b.o.l — Dcache-bb-, 
goto do.syncid-, 

} 

281. Store instructions have an extra complication, because some of them need to 
check for overflow. 

( Finish a store command 281 ) = 
data-x. addr — data-z.o; 
if {data-b.p) wait{l)-, 
switch {data-op 1) { 
case STUNG 1: data-i = stunc; 
default: data-x.o = data-b.o-, goto fin.ex; 
case STSF 1: setjround-, data-b.o.h = store.sf {data-b.o); 
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data-‘interrupt |= exceptions; 

if {{data-'b.o.h & *71800000) = 0 A {data^b.o.h & )) { 

if {data^z.o.l & 4) data^x.o.l = data^b.o.h; 
else data-x.o.h = data^b.o.h; 
data->state = 3; wait{denout_penalty); 

} 

case STHT 1: if {data-’z.o.l 4) data-*x.o.l = data-'b.o.h; 
else data-’x.o.h — data-'b.o.h; 
goto fin.ex; 

case STB 1: case STBU 1: j = {data-'z.o.l & *1) <C 3; i = 56; goto finest; 

case STW 3> 1: case STWU 3> 1: j = {data-z.o.l & *6) <C 3; i = 48; goto finest; 

case STT » 1: case STTU >1: j = {data-'z.o.l & *4) < 3; i = 32; 

finest: (Insert data-'b.o into the proper field of data-'x.o, checking for arithmetic 

exceptions if signed 282 ) ; 
goto fin.ex; 

case CSWAP > 1; (Finish a CSWAP 283); 

case SAVE 1: (Handle an internal SAVE when it’s time to store 342); 

} 

This code is used in section 280. 



addr: octa, §40. 
b: spec, §44. 
hh: int, §167. 

CSWAP = =^94, §47. 
data: register control *, 
§124. 

Dcache: cache *, §168. 
deninjpenalty : int, §349. 
denout^penalty : int, §349. 
do.syncd: label, §364. 
do.syncid: label, §364. 
exceptions: int, 

MMIX-ARITH §32. 
fin.ex: label, §144. 
h: tetra, §17. 
i: register int, §12. 
i: internal.opcode, §44. 
Icache: cache *, §168. 
incr: octa (), mmix-ARITH §6. 
interrupt: unsigned int, §44. 
j: register int, §12. 

1: tetra, §17. 

Id^ready = 13, §267. 

LDB = =^80, §47. 

LDBU = ^82, §47. 

LDHT = ^92, §47. 



LDPTE = macro, §235. 

LDPTP = macro, §235. 

LDSF = =^90, §47. 

LDT = ^88, §47. 

LDTU = =^8a, §47. 

LDW = ^84, §47. 

LDWU = =^86, §47. 
load^sf: octa {), 
MMIX-ARITH §39. 
lockloc: coroutine §23. 
o: octa, §40. 
oandn: octa (), 
MMIX-ARITH §25. 
op: mmix.opcode, §44. 
p: specnode *, §40. 
page.mask: octa, §238. 
pagein: int, §238. 
pst = 66, §49. 

SAVE = =^fa, §47. 
self: register coroutine *, 
§124. 

setjround = macro, §346. 
shift Jeft: octa (), 
MMIX-ARITH §7. 
shift.right: octa (), 



MMIX-ARITH §7. 
sign.bit = macro, §80. 
st = 63, §49. 
stjready = 14, §267. 
state: int, §44. 

STB = ^aO, §47. 

STBU = #a2, §47. 

STHT = ^b2, §47. 
store.sf: tetra (), 
MMIX-ARITH §40. 
STSF = ^b0, §47. 

STT = ^a8, §47. 

STTU = #aa, §47. 
stunc = 67, §49. 

STUNC = =^b6, §47. 

STW = #a4, §47. 

STWU = ^a6, §47. 
synod = 64, §49. 
syncid = 65, §49. 
UNSAVE = ^fb, §47. 
wait = macro (), §125. 
x: specnode, §44. 

2 :: spec, §44. 
zero.octa: octa, 
MMIX-ARITH §4. 
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282. ( Insert data-'b.o into the proper field of data-x.o, checking for arithmetic exceptions 
if signed 282 ) = 

{ 

octa mask', 

if (-i(data^op & 2 )) { octa before, after-, 

before = data-’b.o-, after = shiftjright {shift Jeft{data-^b.o,i),i,Q)-, 
if {before. I 7 ^ after. I V before.h 7 ^ after .h) data-interrupt \= V_BIT; 

} 

mask = shiftjnght{shiftJeft{neg.one,i),j, 1 ); 
data-’b.o — shift.right{shiftJeft{data-’b.o,i),j,l); 
data-’x.o.h ©= mask.h & {data-’x.o.h © data-’b.o.h)-, 
data-’x.o.l ©= mask. I & {data-’x.o.l © data^b.o.l); 

} 

This code is used in section 281. 

283. The CSWAP operation has four inputs ($X, $Y, $Z, rP) as well as three outputs 
($X, Ms[A], rP). To keep from exceeding the capacity of the control blocks in our 
pipeline, we wait until this instruction reaches the hot seat, thereby allowing us non- 
speculative access to rP. 

( Finish a CSWAP 283 ) = 

if {data 7 ^ oldjiot) 'wait{l)-, 

if {data-’x.o.h = g[rP\.o.h A data-’x.o.l = g[rP].o.l) { 
data-’a.o.l = 1 ; /* data-’a.o.h is zero */ 

data-’x.o = data-’b.o-, 

} else { 

g[rP].o = data^x.o-, /* data-’a.o is zero */ 
if {verbose & issue Jiit) { 

print/ ("usettingurP=" ); prinLocta{g\rP].o)-, printf {"\n"); 

} 

} 

data-d = cswap-, /* cosmetic change, affects the trace output only */ 
goto fin.ex-, 

This code is used in section 281. 
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284. The fetch stage. Now that we’ve mastered the most difficult memory 
operations, we can relax and apply our knowledge to the slightly simpler task of 
filling the fetch buffer. Fetching is like loading/storing, except that we use the I-cache 
instead of the D-cache. It’s slightly simpler because the I-cache is read-only. Further 
simplifications would be possible if there were no PREGO instruction, because there is 
only one fetch unit. However, we want to implement PREGO with reasonable efficiency, 
in order to see if that instruction is worthwhile; so we include the complications of 
simultaneous I-cache and IT-cache readers, which we have already implemented for 
the D-cache and DT-cache. 

The fetch coroutine is always present, as the one and only coroutine with stage 
number zero. 

In normal circumstances, the fetch coroutine accesses a cache block containing the 
instruction whose virtual address is given by instjptr (the instruction pointer), and 
transfers up to fetch.max instructions from that block to the fetch buffer. Compli- 
cations arise if the instruction isn’t in the cache, or if we can’t translate the virtual 
address because of a miss in the IT-cache. Moreover, instjptr is a spec variable whose 
value might not even be known; if inst-ptr.p is nonnull, we don’t know what to fetch. 
( External variables 4 } -|-= 

Extern spec insEptr\ /* the instruction pointer (aka program counter) */ 

Extern octa * fetched; /* buffer for incoming instructions */ 

285. The fetch coroutine usually begins a cycle in state fetch.ready, with the most 
recently fetched octabytes in positions fetchJo, fetchJo + 1, . . . , fetch^hi — 1 of a 
buffer called fetched . Once that buffer has been exhausted, the coroutine reverts to 
state 0; with luck, the buffer might have more data by the time the next cycle rolls 
around. 

( Global variables 20 ) += 

int fetchJo, fetch.hi; /* the active region of that buffer */ 
coroutine fetch.co; 
control fetch.ctl; 



a: specnode, §44. 
b: spec, §44. 
control = struct, §44. 
coroutine = struct, §23. 
cswap = 68, §49. 
data: register control *, 
§124. 

Extern = macro, §4. 
fetch^max: int, §59. 
fetch^ready = 23, §291. 
fin.ex: label, §144. 
g: specnode [], §86. 
h: tetra, §17. 



i: register int, §12. 
i: internal_opcode, §44. 
interrupt: unsigned int, §44. 
issue^bit = 1 <C 0, §8. 
j: register int, §12. 

1 : tetra, §17. 

neg^one: octa, MMIX-ARITH 
o: octa, §40. 
octa = struct, §17. 
old^hot: control *, §60. 
op: mmix_opcode, §44. 
p: specnode *, §40. 



printf : int (), <stdio.h>. 
rP = 23, §52. 
shift Jeft: octa (), 
MMIX-ARITH §7. 
shift jright: octa (), 
MMIX-ARITH §7. 
spec = struct , §40. 
stage: int, §23. 

V_BIT = 1 < 14, §54. 
verbose: int, §4. 
wait = macro (), §125. 
x: specnode, §44. 



prinEoeta: static void (), §19. 
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286. ( Initialize everything 22 ) += 
fetch.co .ctl = kfetch.ctT, 
fetch.co .name = "Fetch"; 
fetch.ctl.go.o.l = 4; 

startup {&efetch.co , 1); 

287. ( Restart the fetch coroutine 287 ) = 

if {fetch.co .lockloc) *{fetch.co .lockloc) = A, fetch.co .lockloc — A; 
unschedule {kfetch.co ) ; 
startup {Si fetch.co , 1); 

This code is used in sections 85, 160, 308, 309, and 316. 

288. Some of the actions here are done not only by the fetcher but also by the first 
and second stages of a prego operation. 

^define wait.orjpass (t) 

if {data-h = prego) { pass^after(t)\ goto passit; } 
else uiait{t) 

{ Simulate an action of the fetch coroutine 288 ) = 
switchO'. switch (data-^state) { 
new-fetch: data~>state = 0; 

case 0: (Wait, if necessary, until the instruction pointer is known 290 ); 
data^y.o = instjptr.o\ 

data->state = 1; datamnterrupt = 0; data-x.o = data^z.o = zero-octa-, 
case 1: startjetch-. if (data^.o.h &c sign.bit) 

{ Begin fetch with known physical address 296 ) ; 
if (page-bad) goto bad-feteh\ 

if (ITeache-lock V (j = get-reader (IT cache)) < 0) wait(l)-, 
startup (EzIT cache-reader [j], IT caehe-aecess-time ) ; 

( Look up the address in the IT-cache, and also in the I-cache if possible 291 ); 
wait-or-pass (IT cache-access-time ) ; 

( Other cases for the fetch coroutine 298 ) 

} 

This code is used in section 125. 

289. (Handle special cases for operations like prego and Idvts 289) = 
if (data-i = prego) goto start-fetch-, 

See also section 352. 

This code is used in section 266. 

290. ( Wait, if necessary, until the instruction pointer is known 290 ) = 
if (inst-ptr.p) { 

if (inst-ptr.p yf UNKNDWN_SPEC A inst-ptr .p- known) 
inst-ptr.o = inst-ptr .p-o, inst-ptr.p = A; 
wait(l)-, 

} 

This code is used in section 288. 
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291. 9 ^define got.IT 19 /* siaie when IT-cache entry has been computed */ 

:?^define ITjmiss 20 /* state when IT-cache doesn’t hold the key */ 

:?^define ITJiit 21 /* state when physical instruction address is known */ 

^T^define Ihit.and.miss 22 /* state when I-cache misses */ 

:?5^define fetch.ready 23 /* siaie when instructions have been read */ 

:?5^define got.one 24 /* state when a “preview” octabyte is ready */ 

(Look up the address in the IT-cache, and also in the I-cache if possible 291 ) = 
p = cachets earch {IT cache, trans.key{data~y.o)); 
if {-*Icache V Icache-^lock V {j = get.reader {Icache)) < 0) 

(Begin fetch without I-cache lookup 295); 
startup {Sz I cache-dreader [j], Icache-*accessAime)’, 
if {p) ( Do a simultaneous lookup in the I-cache 292 ) 
else data^state = IT.miss; 

This code is used in section 288. 



access^time: int, §167. 
bad^fetch: label, §301. 
cache^search: static 
cacheblock *(), §193. 
ctl: control *, §23. 
data: register control *, 
§124. 

fetch^co: coroutine, §285. 
fetch^ctl: control, §285. 
get^reader: static int (), §183. 
go: specnode, §44. 
h: tetra, §17. 
i: inter nal.opcode, §44. 
Icache: cache *, §168. 
instjptr: spec, §284. 
interrupt: unsigned int, §44. 



ITcache: cache *, §168. 
j: register int, §12. 
known: bool, §40. 

1: tetra, §17. 

Idvts = 60, §49. 
lock: lockvar, §167. 
lockloc: coroutine **, §23. 
name: char *, §23. 
o: octa, §40. 
p: specnode *, §40. 
p: register cacheblock *, 
§258. 

page^bad: bool, §238. 
pass^after = macro ( ), §125. 
passit: label, §134. 



prego = 73, §49. 
reader: coroutine *, §167. 
sign.bit = macro, §80. 
startup: static void (), §31. 
state: int, §44. 
trans^key = macro (), §240. 
UNKNDWN_SPEC = macro, §71. 
unschedule: static void (), 
§33. 

wait = macro (), §125. 
x: specnode, §44. 
y: spec, §44. 

2 :: spec, §44. 
zero.octa: octa, 
MMIX-ARITH §4. 
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292. We assume that it is possible to look up a virtual address in the IT-cache 
at the same time as we look for a corresponding physical address in the I-cache, 
provided that the lower b + c bits of the two addresses are the same. (See the remarks 
about “page coloring,” when we made similar assumptions about the DT-cache and 
D-cache.) 

( Do a simultaneous lookup in the I-cache 292 ) = 

{ 

(Update IT-cache usage and check the protection bits 293 }; 
data^z.o — phys-addr(data^.o,p-'data[0]); 
if (Icache^b -|- Icache~^c > pages A 

{{data^.o.l © data->z.o.l) & {{Icache^bb <C Icache-*c) — (1 <C pages)))) 
data^state = ITJiit\ /* spurious I-cache lookup */ 
else { 

q — caches earch [I cache, data-z.o)-, 

if (<?) { 

q = use.and.fix{Ieache,q)-, 

{ Copy the data from block q to fetehed 294 ) ; 
data^state = fetehseady ; 

} else data~*state = Ihit.and.miss-, 

} 

waitsrjpass {max (IT cache- access-time , leache-access-time ) ) ; 

} 

This code is used in section 291. 

293. (Update IT-cache usage and check the protection bits 293 ) = 
p = usesnd.fix {IT cache, p)\ 

if (-.(p-data[0].i & (PX_BIT > PR0T_0FFSET))) goto badjetch; 

This code is used in sections 292 and 295. 

294. At this point instjptr.o equals data-y.o. 

{ Copy the data from block q to fetehed 294 ) = 
if {data-i ^ prego) { 

for {j = 0; j < Icache-bb 3; i++) fetched[j] — q-data[j]; 
fetchjo = {instjptr .0.1 & {Icache-bb — 1)) ^ 3; 
fetch.hi = Icaehe-bb 3> 3; 

} 

This code is used in sections 292 and 296. 

295. (Begin fetch without I-cache lookup 295 } = 

{ 

if (P) { 

(Update IT-cache usage and check the protection bits 293 ); 
data-z.o = phys.addr {data-y . 0 , p-data[0])-, 
data- state = IT. hit-, 

} else data-state = IT.miss', 
wait.orjpass {IT cache-access.time ) ; 

} 

This code is used in section 291. 
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296. ( Begin fetch with known physical address 296 ) = 

{ 

if {data^ = prego A -^{data^loc.h & sign.bit)) goto fin.ex', 
data^z.o = data^.o; data-^z.o.h —= signj>it\ 
known.phys: if {data-'z.o.h &i*lfffOQOQ) goto bad.fetch; 
if {-ilcache ) { Read from memory into fetched 297 ) ; 
if {Icache-lock V (j = geEreader (Icache)) < 0) { 
data-’state = waiEor.pass{l)\ 

} 

startup {&ilcache~’reader \j] , Icache-^access-time); 
q = cache^search{Icache , data-‘z.o)\ 

if (<?) { 

q = use.and.fix{Icache, q); 

{ Copy the data from block q to fetched 294 ) ; 
data-state = fetch.ready ; 

} else data-’state = Ihit.andjmiss\ 
waiEor.pass {Icache-'accessHime ) ; 

} 

This code is used in section 288. 



access.time: int, §167. 
b: int, §167. 
bad^fetch: label, §301. 
bb: int, §167. 
c: int, §167. 
cache.search: static 
cacheblock *(), §193. 
data: octa *, §167. 
data: register control *, 
§124. 

fetchJii: int, §285. 
fetchJo: int, §285. 
fetch^ready =23, §291. 
fetched: octa *, §284. 
fin.ex: label, §144. 
get^reader: static int (), §183. 
h: tetra, §17. 



i: internal.opcode, §44. 
Icache: cache *, §168. 
Ihit.and.miss =22, §291. 
inst.ptr: spec, §284. 

IT.hit =21, §291. 

IT.miss =20, §291. 
ITcache: cache =t=, §168. 
j: register int, §12. 

1: tetra, §17. 
loc: octa, §44. 
lock: lockvar, §167. 
max = macro ( ), §268. 
o: octa, §40. 

p: register cacheblock *, 

§258. 

pagers: int, §238. 



phys.addr: static octa (), 
§241. 

prego = 73, §49. 

PRQT_0FFSET = 5, §54. 

PX_BIT = 1 < 5, §54. 

q: register cacheblock *, 

§258. 

reader: coroutine *, §167. 
sign.bit = macro, §80. 
startup: static void (), §31. 
state: int, §44. 
use.and.fix: static 
cacheblock *(), §196. 
wait^or.pass = macro (), §288. 
y: spec, §44. 
spec, §44. 
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297. (Read from memory into fetched 297 ) = 

{ octa addr\ 

addr — data-z.o\ 
if {memJock) wait{l); 
setJock{S£memJocker , memJock)-, 

startup {k^memJocker , mem.addr.time + mem.read.time); 
addr . I &= —{busjwords <C 3); 
fetched[Q\ = memjread{addr)\ 
for (j = 1 ; j < busjwords', j++) 

fetched[j] = memjiash[lastji\.chunk[{{addr .1 & 3) + j\\ 

fetchjo = (data^z.o.l 3) & {busjwords — 1); fetch.hi = busjwords-, 
data->state = fetch.ready ; 
wait[mem^addrJ,ime + memjreadHime)-, 

} 

This code is used in section 296. 

298. (Other cases for the fetch coroutine 298) = 
case ITjmiss-. if {IT cache- filler .next) 

if {data-i = prego) goto fin.ex', else wait{l); 
if {noJiardware.PT V page.f) (Insert dummy instruction for page table emulation 302 ); 
p — alloc.slot{ITcache,transJtey{data-y.o))-, 
if (^p) /* hey, it was present after all */ 

if {data-i = prego) goto fin.ex', else goto new-fetch; 
data-ptrj) = ITcache-filler.ctl .ptr.b = (void *) p; 

ITcache- filler. ctl .y.o = data-y.o-, 
set Jock ( self , ITcache- fill Jock ) ; 
startup {&zITcache- filler , 1); 
data- state = got. IT ; 

if {data-i = prego) goto fin.ex; else sleep-, 
case got.IT: release.lock{self , ITcache-fill.lock)-, 

if (-.(dafa-z.o./ & (PX_BIT » PR0T_0FFSET))) goto bad.fetch-, 
data-z.o = phys.addr {data-y . 0 , data-z.o)-, 
fetch.retry: data-state = IT. hit-, 

case IT.hit: if {data—i = prego) goto fin.ex-, else goto known.phys-, 

case Ihit.and.miss -. (Try to get the contents of location data-z.o in the I-cache 300 ); 

See also section 301. 

This code is used in section 288. 

299. ( Special cases for states in later stages 272 ) += 

case IT. miss-, case Ihit.and.miss: case IT.hit: case fetch.ready: goto switchO-, 
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300 . (Try to get the contents of location data-^z.o in the I-cache 300 ) = 
if [I cache-* filler .next) goto /eic/i_retry ; 

if {{Scache A Scache^lock) V {^Scache A memJock)) goto fetch.retry; 
q = alloc.slot{Icache, data^z.o)\ 
if (-ig) goto fetch.retry; 

if (Scache) setJock{^Icache-*filler,Scache-*lock) 
else set Jock (Szlcache~-*filler , mem Jock); 
setJock{self , Icache-*fillJock); 
data^ptr.b = Icache-*filler.ctl .ptr.b = (void *) q; 

Icache^filler.ctl .z.o = data^z.o; 

startup {Sz I caches filler ^ Scache ? Scache-*accessJime : mem.addrJime); 
data-*state = got.one; 

if {data-*i = prego) goto /in.ea:; else sleep; 

This code is used in section 298. 



access^time: int, §167. 
alloc^slot: static cacheblock 

* 0 , §205. 

badjetch: label, §301. 
bus.words: int, §214. 
chunk: octa *, §206. 
data: register control *, 
§124. 

fetchJii: int, §285. 
fetch Jo: int, §285. 
fetch.ready =23, §291. 
fetched: octa *, §284. 
fill Jock: lockvar, §167. 
filler: coroutine, §167. 
filler^ctl: control, §167. 
fin.ex: label, §144. 
gotJT = 1^, §291. 
got^one =24, §291. 
i: inter nal.opcode, §44. 
Icache: cache *, §168. 
Ihit.and.miss =22, §291. 
IT.hit = 21, §291. 



IT.miss =20, §291. 

ITcache: cache *, §168. 
j: register int, §12. 
known.phys: label, §296. 

1: tetra, §17. 
last.h: int, §211. 
lock: lockvar, §167. 
mem.addr.time: int, §214. 
mem.hash: chunknode *, 
§207. 

mem.lock: lockvar, §214. 
mem.locker: coroutine, §127. 
mem.read: octa (), §210. 
mem.read.time: int, §214. 
new.fetch: label, §288. 
next: coroutine *, §23. 
no.hardware.PT: bool, §242. 
o: octa, §40. 
octa = struct, §17. 
p: register cacheblock *, 
§258. 

page.f: int, §238. 



phys.addr: static octa (), 
§241. 

prego = 73, §49. 

PRQT_0FFSET = 5, §54. 
ptr.b: void *, §44. 

PX_BIT = 1 < 5, §54. 

q: register cacheblock *, 

§258. 

release.lock = macro ( ), §37. 
Scache: cache *, §168. 
self: register coroutine *, 
§124. 

set.lock = macro (), §37. 
sleep = macro, §125. 
startup: static void (), §31. 
state: int, §44. 
switchO: label, §288. 
trans.key = macro (), §240. 
wait = macro (), §125. 
y: spec, §44. 
spec, §44. 
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301. The Tcache filler will wake us up with the octabyte we want, before it has 
filled the entire cache block. In that case we can fetch one or two instructions before 
the rest of the block has been loaded. 

( Other cases for the fetch coroutine 298 } += 
badjetch'. if {data-d = prego) goto fin.ex\ 
data^interrupt |= PX_BIT; 
swym.one-. /etc/ied [0]./i = fetched^].! — SWYM <C 24; 
goto fetch.one; 

case goEonc. fetched [0] = data-x.o; /* a “preview” of the new cache data */ 
fetch^one-. fetchJo — 0; fetch.hi = 1; 
data-state = fetch.ready ; 

case fetchjready: if (self^lockloc) *{self^lockloc)=A,self^lockloc = A-, 
if {data^ = prego) goto fin.ex; 
for O' = 0; j < fetch.max; j++) { 
register fetch *new.tail-, 

if {tail = fetch.bot) newHail = fetch.top\ 
else newJ^ail = tail — 1; 

if {new-tail = head) break; /* fetch buffer is full */ 

(Install a new instruction into the tail position 304); 
tail = new-taiT, 
if (sleepy) { 

sleepy = false ; sleep ; 

} 

insEptr.o = incr{instjptr .0,4)-, 
if (fetch Jo = fetchjii) goto new-fetch-, 

} 

wait(l); 

302. (Insert dummy instruction for page table emulation 302 ) = 

{ 

if (cache.search(ITcache,trans-key(insEptr.o))) goto new-fetch-, 
data->interrupt |=F_BIT; 
sleepy = true\ 
goto swym.one; 

} 

This code is used in section 298. 

303. (Global variables 20 ) += 

bool sleepy ; / * have we just emitted the page table emulation call? * / 
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304. At this point we check for egregiously invalid instructions. (Sometimes the dis- 
patcher will actually allow such instructions to occupy the fetch buffer, for internally 
generated commands.) 

( Install a new instruction into the tail position 304 ) = 
tail^loc = inst.ptr.o] 

if {inst.ptr . 0.1 Sz 4) tail^inst = fetched [fetch Jo +-\-]. I 
else tail~*inst = fetched[fetchJo].h; 
tail~*interrupt = data-*interrupt\ 
i = tail-*inst ^ 24; 

if {i > RESUME Ai < SYNC A {tail-*inst & badJnst_mask[i — RESUME])) 
tail~*interrupt |= 
tail^noted = false; 

if {inst.ptr . 0.1 = breakpoint. I A inst.ptr .o.h = breakpoint .h) breakpoint.hit = true; 

This code is used in section 301. 

305. The commands RESUME, SAVE, UNSAVE, and SYNC should not have nonzero bits 
in the positions defined here. 

( Global variables 20 ) -\-= 

int badJnst_mask[4] = {^f f f ff e, f f f , f f f 00, ff ff 8}; 



B_BIT = 1 < 2, §54. 
bool = enum, §11. 
breakpoint: octa, §10. 
breakpoint.hit: bool, §12. 
cache^search: static 
cacheblock *(), §193. 
data: register control *, 
§124. 

F_BIT = 1 < 17, §54. 
false = 0, §11. 
fetch = struct , §68. 
fetchJ)ot: fetch §69. 
fetchJii: int, §285. 
fetchJo: int, §285. 
fetch.max: int, §59. 
fetch^ready =23, §291. 
fetch^top: fetch =t=, §69. 



fetched: octa *, §284. 
fin.ex: label, §144. 
goEone =24, §291. 
h: tetra, §17. 
head: fetch *, §69. 
i: internaLopcode, §44. 
i: register int, §12. 
incr: octa (), mmix-arith §6. 
inst: tetra, §68. 
inst.ptr: spec, §284. 
interrupt: unsigned int, §44. 
interrupt: unsigned int, §68. 
ITcache: cache *, §168. 
j: register int, §12. 

1 : tetra, §17. 

loc: octa, §68. 

lockloc: coroutine §23. 



newjetch: label, §288. 
noted: bool, §68. 
o: octa, §40. 
prego = 73, §49. 

PX_BIT = 1 < 5, §54. 

RESUME = ^f9, §47. 
self: register coroutine *, 
§124. 

sleep = macro, §125. 
state: int, §44. 

SWYM = ^fd, §47. 

SYNC = #fc, §47. 

tail: fetch *, §69. 

trans^key = macro (), §240. 

true = 1, §11. 

wait = macro (), §125. 

x: specnode, §44. 
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306. Interrupts. The scariest thing about the design of a pipelined machine is 
the existence of interrupts, which disrupt the smooth flow of a computation in ways 
that are difficult to anticipate. Fortunately, however, the discipline of a reorder buffer, 
which forces instructions to be committed in order, allows us to deal with interrupts 
in a fairly natural way. Our solution to the problems of dynamic scheduling and 
speculative execution therefore solves the interrupt problem as well. 

MMIX has three kinds of interrupts, which show up as bit codes in the interrupt field 
when an instruction is ready to be committed: H_BIT invokes a trip handler, for TRIP 
instructions and arithmetic exceptions; F_BIT invokes a forced-trap handler, for TRAP 
instructions and unimplemented instructions that need to be emulated in software; 
E_BIT invokes a dynamic-trap handler, for external interrupts like I/O signals or for 
internal interrupts caused by improper instructions. In all three cases, the pipeline 
control has already been redirected to fetch new instructions starting at the correct 
handler address by the time an interrupted instruction is ready to be committed. 

307. Most instructions come to the following part of the program, if they have 
finished execution with any Is among the eight trip bits or the eight trap bits. 

If the trip bits aren’t all zero, we want to update the event bits of rA, or perform 
an enabled trip handler, or both. If the trap bits are nonzero, we need to hold onto 
them until we get to the hot seat, when they will be joined with the bits of rQ and 
probably cause an interrupt. A load or store instruction with nonzero trap bits will 
be nullified, not committed. 

Underflow that is exact and not enabled is ignored, in accordance with the IEEE 
standard conventions. (This applies also to underflow triggered by RESUME_SET.) 
^define isJoad.store(i) {i > Id A i < cswap) 

{ Handle interrupt at end of execution stage 307 ) = 

{ 

if {{data->interrupt &L*±f) A isJoad.store{data-d)) goto state.5\ 
j = data-interrupt & *ff00; 
data-interrupt — 

if {{j & (U_BIT -P X_BIT)) = U_BIT A -^{data~>ra.o.l & U_BIT)) j &= ~U_BIT; 
data-'arith.exc = {j & ^data-'ra.o.l) 8; 

if {j & data^ra.o.l) (Prepare for exceptional trip handler 308 ); 
if [data->interrupt k, * ff) goto state.5\ 

} 

This code is used in section 144. 
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308. Since execution is speculative, an exceptional condition might not be part 
of the “real” computation. Indeed, the present coroutine might have already been 
deissued. 

( Prepare for exceptional trip handler 308 ) = 

{ 

i = issued-between ( data , cool ) ; 
if (i < deissues) goto die; 
deissues = i; 

oldJ,ail = tail = head; resuming — 0; /* clear the fetch buffer */ 

{ Restart the fetch coroutine 287 ) ; 
coolJiist = data-hist; 

for (i = j data^ra.o. I, m = 16; -'(f&D_BIT); i^=l,m+=16) ; 
data-> arith.exe [= {j & 10000 (m ^ 4))) 3> 8; 

/* trips taken are not logged as events */ 
data^go .o.h = 0, data-'go .o.l = m; 
instjptr.o = data-'go .o, instjptr.p = A; 
data-'interrupt \ — H_BIT; 
goto state-4 ; 

} 

This code is used in section 307. 

309. ( Prepare to emulate the page translation 309 } = 
i = issued-between{data, cool); 

if {i < deissues) goto die-, 
deissues = z; 

old.tail = tail = head; resuming =0; /* clear the fetch buffer */ 

(Restart the fetch coroutine 287); 
cooLhist = data-hist; 
mst_ptr.p = UNKN0WN_SPEC; 
data~^interrupt |= F_BIT; 

This code is used in section 310. 



arith.exc. unsigned int, §44. 
cool: control *, §60. 
cooLhist: unsigned int, §99. 
cswap = 68, §49. 

D_BIT = 1 < 15, §54. 
data: register control *, 
§124. 

deissues: int, §60. 
die: label, §144. 

E_BIT = 1 < 18, §54. 

F_BIT = 1 < 17, §54. 
go: specnode, §44. 
h: tetra, §17. 



H_BIT = 1 < 16, §54. 
head: fetch *, §69. 
hist: unsigned int, §44. 
i: internaLopcode, §44. 
i: register int, §12. 
insLptr: spec, §284. 
interrupt: unsigned int, §44. 
issued.between: static int (), 
§159. 

j: register int, §12. 

1 : tetra, §17. 

Id = 56, §49. 
m: register int, §12. 



o: octa, §40. 
old.tail: fetch *, §70. 
p: specnode *, §40. 
ra: spec, §44. 

RESUME_SET =2, §320. 
resuming: int, §78. 
state.4- label, §310. 
state.5: label, §310. 
tail: fetch *, §69. 

U_BIT = 1 < 10, §54. 
UNKNQWN_SPEC = macro, §71. 
X_BIT = 1 < 8, §54. 



MMIX-PIPE: INTERRUPTS 



292 



310. We need to stop dispatching when calling a trip handler from within the 
reorder buffer, lest we issue an instruction that uses 5 [255] or rB as an operand. 

( Special cases for states in the first stage 266 ) += 
emulate^viri : ( Prepare to emulate the page translation 309 } ; 
state .4 '■ data^state = 4; 
case 4: if (dispatchJock) wait{l); 

setJock{self , dispatchJock)-, 
state.5: data-'state = 5; 
case 5: if {data ^ oldjiot) wait{l)-, 

if {{data-interrupt &F_BIT) A data-’i 7 ^ trap) { 
inst.ptr.o — g[rT].o, instjptr.p = A; 
if (isJoad.store{data-*i)) nullifying = true-, 

} 

if {data->interrupt &c*If) { 

g[rQ].o.h \— data->interrupt &*ff; 
new-Q.h |= data-interrupt & *ff ; 
if {verbose & issue Jyit) { 

print/ ("usettingurQ=" ); prinEocta{g\rQ].o)-, printf {"\n")-, 

} 

} 

goto die; 

311. The instructions of the previous section appear in the switch for coroutine 
stage 1 only. We need to use them also in later stages. 

( Special cases for states in later stages 272 ) += 
case 4: goto state-4 ; 
case 5: goto stateS; 

312. (Special cases of instrnction dispatch 117 ) += 

case trap-, if {{flags[op] X-is-dest-bit) A cool-xx < cool-G A cool-xx > cooLL) 
goto increase-L; 

if {-^g[rT], up-known y ^g[rJ].up->known) goto stall; 
inst-ptr — specval {&ig[rT]); /* traps and emulated ops */ 

cool-need-b = true, cool-b = special (&<j[255]); 
case trip-. 

if {-ig[rJ].up-knouin) goto stall; 
cool-ren-X = true , spec-install{&ig[255], &ccool-x); 
cool-x .known = true, cool-x.o = g[rJ].up-o; 
if {i = trip) cool-go .0 = zero-octa; 

cool-ren-a = true , spec-install{Ezg[i = trap ? rBB : rB], &ccool-a); break; 

313. ( Cases for stage 1 execution 155 ) += 

case trap: data-interrupt |=F_BIT; data-a.o = data-b.o; goto fin-ex; 
case trip: data-interrupt |=H_BIT; data-a.o = data-b.o; goto fin-ex; 
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314. The following check is performed at the beginning of every cycle. An instruc- 
tion in the hot seat can be externally interrupted only if it is ready to be committed 
and not already marked for tripping or trapping. 

( Check for external interrupt 314 ) = 
g[rI].o = incr{g[rI].o, —1)-, 
if {g[rI].o.l = 0 A g[rI].o.h = 0) { 

g[rQ].o.l 1= INTERVAL_TIMEOUT, neic.Q.l |= INTERVAL_TIMEOUT ; 
if {verbose & issue J)it) { 

print/ ( "usett ingui'Q=" ); print.octa{g[rQ\.o)\ pnnt/("\n"); 

} 

} 

trying. to -interrupt = false ; 

if {{{g[rQ].o.h & g[rK].o.h) V {g[rQ].o.l & g[rK].o.l)) A cool 7 ^ hot A 

-^{hot-'interrupt & (E_BIT + F_BIT -f H_BIT)) A doing -interrupt A 
-i{hot-d = resum)) { 

if {hot^owner) trying -to -interrupt = true\ 
else { 

hot-’interrupt |= E_BIT; 

(Deissue all but the hottest command 316); 
inst-ptr.o = g[rTT].o\ inst-ptr.p = A; 

} 

} 

This code is used in section 64. 

315. (Global variables 20 ) += 

bool trying.to interrupt] /* encouraging interruptible operations to pause */ 

bool nullifying', /* stopping dispatch to nullify a load/store command */ 



a: specnode, §44. 
b: spec, §44. 
bool = enum, §11. 
cool: control *, §60. 
cooLG: int, §99. 
cooLL: int, §99. 
data: register control *, 
§124. 

die: label, §144. 
dispatchJock: lockvar, §65. 
doing^interrupt: int, §65. 
E_BIT = K 18, §54. 

F_BIT = 1 < 17, §54. 

false = 0 , § 11 . 

fin.ex: label, §144. 

flags: unsigned char [], §83. 

g: specnode [], §86. 

go: specnode, §44. 

h: tetra, §17. 

H_BIT = 1 < 16, §54. 

hot: control *, §60. 

i: internal.opcode, §44. 

i: register int, §12. 

incr: octa (), mmix-ARITH §6. 

increase.L: label, §110. 



inst.ptr: spec, §284. 
interrupt: unsigned int, §44. 
INTERVAL_TIMEDUT = 1 < 6, 

§57. 

is Joad.store = ma.cYO (), §307. 
issue.bit = 1 <C 0 , § 8 . 
known: bool, §40. 

1: tetra, §17. 
need.b: bool, §44. 
new.Q: octa, §148. 
o: octa, §40. 
old.hot: control *, §60. 
op: register mmix.opcode, 
§75. 

owner: coroutine *, §44. 

p: specnode *, §40. 

print^octa: static void (), §19. 

printf: int (), <stdio.h>. 

rB=0, §52. 

rBB = 7, §52. 

ren.a: bool, §44. 

ren.x: bool, §44. 

resum = 89, §49. 

rl = 12, §52. 



rJ = 4, §52. 
rK = 15, §52. 
rQ = 16, §52. 
rT = 13, §52. 
rTT = 14, §52. 
self: register coroutine *, 
§124. 

set Jock = macro (), §37. 
specJnstall: static void {), 
§95. 

specval: static spec (), §93. 

stall: label, §75. 

state: int, §44. 

trap = 82, §49. 

trip = 83, §49. 

true = 1 , § 11 . 

up: specnode *, §40. 

verbose: int, §4. 

wait = macro (), §125. 

x: specnode, §44. 

XJs^desGbit = "^20, §83. 

XX : unsigned char, §44. 
zero.octa: octa, 
MMIX-ARITH §4. 
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316 . It’s possible that the command in the hot seat has been deissued, but only if 
the simulator has done so at the user’s request. Otherwise the test H > deissues^ here 
will always succeed. 

The value of cooLhist becomes flaky here. We could try to keep it strictly up to 
date, but the unpredictable nature of external interrupts suggests that we are better 
off leaving it alone. (It’s only a heuristic for branch prediction, and a sufficiently 
strong prediction will survive one-time glitches due to interrupts.) 

( Deissue all but the hottest command 316 ) = 
i = issuedJ>etween{hot , cool); 
if {i > deissues) { 
deissues = i; 

tail = head; resuming = 0; /* clear the fetch buffer */ 

(Restart the fetch coroutine 287); 
if {isJoad.store(hot-i)) nullifying = true; 

} 

This code is used in section 314. 

317 . Even though an interrupted instruction has officially been either “committed” 
or “nullihed,” it stays in the hot seat for two or three extra cycles, while we save 
enough of the machine state to resume the computation later. 

(Begin an interruption and break 317) = 

{ 

if {^{hot^interrupt &c E_B1T)) g[rK].o = zero.octa; /* trap */ 
if {{{hot-'interrupt & H_BIT) A hot-d ^ trip) V 

{{hot-interrupt & F_BIT) A hot-i ^ trap) V 

{hot-interrupt &E_BIT)) doing -interrupt = 3, suppress-dispatch = true; 
else doing -interrupt = 2; /* trip or trap started by dispatcher */ 

break; 

} 

This code is used in section 146. 

318 . If a memory failure occurs, we should set rF here, either in case 2 or case 1. 
The simulator doesn’t do anything with rF at present. 

( Perform one cycle of the interrupt preparations 318 ) = 
switch {doing-interrupt — ) { 

case 3: (Set resumption registers (rB, $255) or (rBB, $255) 319); break; 
case 2: (Set resumption registers (rW,rX) or (rWW,rXX) 320 ); break; 
case 1: (Set resumption registers (rY,rZ) or (rYY,rZZ) 321 ); 
if {hot = reorder-bot) hot = reorder-top; else hot — ; 
break; 

} 

This code is used in section 64. 
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319 . (Set resumption registers (rB,$255) or (rBB, $255) 319) = 
j = hot-‘interrupt & H_BIT ; 
g\j ? rB : rBB].o = g[255].o; 

5 [ 255 ].o = g[rJ].o\ 
if {verbose & issue J>it) { 

if O') { 

print/ ( "usett ingurB=" ); print.octa{g[rB\.o)\ 

} else { 

print/ ( "usett ingurBB=" ); prinRocta{g[rBB].o)-, 

} 

pnnt/ (" ,u$255=" ); prinRocta{g[255].o); print/ ("\n" ); 

} 

This code is used in section 318. 



cool: control *, §60. 
cooLhist: unsigned int, §99. 
deissues: int, §60. 
doing^interrupt: int, §65. 
E_BIT = 1 < 18, §54. 

F_BIT = 1 < 17, §54. 
g: specnode [], §86. 

H_BIT = 1 < 16, §54. 
head: fetch *, §69. 
hot: control *, §60. 
i: internal.opcode, §44. 
i: register int, §12. 
interrupt: unsigned int, §44. 



is Joad.store = ma,cro (), §307. 
issue.bit = 1 0, §8. 

issued.hetween: static int (), 
§159. 

j: register int, §12. 

nullifying: bool, §315. 
o: octa, §40. 

print^octa: static void (), §19. 
printf: int (), <stdio.h>. 
rB=0, §52. 
rBB = 7, §52. 

reorder^bot: control *, §60. 



reorder.top: control *, §60. 
resuming: int, §78. 
rJ = A, §52. 
rK = lb, §52. 

suppress.dispatch: bool, §65. 

tail: fetch *, §69. 

trap = 82, §49. 

trip = 83, §49. 

true = 1, §11. 

verbose: int, §4. 

zero.octa: octa, 

MMIX-ARITH §4. 
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320. Here’s where we manufacture the “ropcodes” for resumption. 

T^deflne RESUME_ AGAIN 0 /* repeat the command in rX as if in location rW — 4 */ 

^define RESUME_C0NT 1 /* same, but substitute rY and rZ for operands */ 

:j^define RESUME_SET 2 /* set register $X to rZ */ 

#define RESUME_TRANS 3 

/* install (rY,rZ) into IT-cache or DT-cache, then RESUME_AGAIN */ 

#define pack.bytes{a,b,c,d) ((((((unsigned) (a) <C 8) + (&)) <C 8) + (c)) <C 8) + (d) 

(Set resumption registers (rW,rX) or (rWW,rXX) 320 ) = 
j = pack.bytes {hot-op , hot-xx, hot-yy, hot-zz)\ 
if {hot-interrupt E_B1T) { /* trip */ 

g[rW].o= incr{hot-loc,A)\ 
g[rX\.o.h = signJ)it,g[rX\.o.l — j-, 
if {verbose & issue Jyit) { 

print/ ("usettingurW=" ); prinEocta{g\rW\.o)-, 
print/ (" ,ur’X=" ); prinEoeta{g[rX\.o)-, printf {"\n")\ 

} 

} else { /* trap */ 

q\rWW\.o — hot-qo.o; 
g[rXX].o.l = j; 

if {hot-interrupt &iF_Bn) { /* forced */ 

if {hot-i 7 ^ trap) j = RESUME_TRANS ; /* emulate page translation */ 

else if {hot-op = TRAP) j = *80; /* TRAP */ 

else if {fLags[internaLop[hot-op\\ Hz X.is-dest.bit) / = RESUME_SET ; 

/* emulation */ 

else j = *80; /* emulation when r[X] is not a destination */ 

} else { /* dynamic */ 

if {hot-interim) 

j = {hot-i = fremy hot-i = synod V hot-i = syncid ? RESUME_C0NT : RESUME_AGAIN); 
else if {is Joad^store {hot-i)) j = RESUME_AGAIN ; 
else j = *80; /* normal external interruption */ 

} 

g[rXX].o.h = {j <C 24) + {hot-interrupt & *ff); 
if {verbose & issue Jiit) { 

print/ ("usettingurWW=" ); print.octa{g[rWW\.o); 
printf {" ,yJoXL="); print.octa{g[rXX\.o); printf {"\n"); 

} 

} 

This code is used in section 318. 
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321 . (Set resumption registers (rY,rZ) or (rYY,rZZ) 321 ) = 
j = hot-‘interrupt & H_BIT ; 

if ({hot-'interrupt & F_BIT) A hot-’op = SWYM) g[rYY].o = hot-'go.o- 
else g\j ? rY : rYY].o = hot^.o\ 

if (hot^ = st V hotN, = pst) g[j ? rZ : rZZ\.o = hot-x.o\ 
else g\j ? rZ : rZZ].o = hot-*z.o; 
if {verbose & issuej>it) { 

if (i) { 

print/ ("usettingurY=" ); print.octa{g[rY\.o)-, 
print/ (" ,ui'Z="); prinRocta{g\rZ\.o)-, printf {"\n")-, 

} else { 

print/("usettingurYY=" ); print.octa{g[rYY].o); 
print/ (" ,ui'ZZ=" ); prinRocta{g[rZZ].o)-, printf {"\n")-, 

} 

} 

This code is used in section 318. 



F_BIT = 1 < 17, §54. 

Haas: unsigned char [1, §83. 
frem = 25, §49. 
g: specnode [], §86. 
go: specnode, §44. 
h: tetra, §17. 

H_BIT = 1 < 16, §54. 
hot: control *, §60. 
i: internal.opcode, §44. 
incr: octa (), mmix-ARITH §6. 
interim: bool, §44. 
internaLop: internal.opcode 

[], §51. 

interrupt: unsigned int, §44. 
isJoad^store = macro (), §307. 
issuejfit = 1 <C 0, §8. 



j: register int, §12. 

1: tetra, §17. 

loc: octa, §44. 

o: octa, §40. 

op: mmix.opcode, §44. 

print^octa: static void (), 

printf: int (), <stdio.h>. 

pst = 66, §49. 

=24, §52. 
rWW =2S, §52. 
rX =25, §52. 
rXX =29, §52. 
rT = 26, §52. 
rVY =30, §52. 
rZ = 27, §52. 
rZZ = 31, §52. 



sign.bit = macro, §80. 
st = 63, §49. 

SWYM = #fd, §47. 
synod = 64, §49. 
syncid = 65, §49. 

19. TRAP = #00, §47. 
trap = 82, §49. 
verbose: int, §4. 
x: specnode, §44. 
XXs^dest^bit = #20, §83. 
XX : unsigned char, §44. 
y: spec, §44. 
yy: unsigned char, §44. 
.z: spec, §44. 
zz: unsigned char, §44. 
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322. Whew; we’ve successfully interrupted the computation. The remaining task 
is to restart it again, as transparently as possible. 

The RESUME instruction waits for the pipeline to drain, because it has to do such 
drastic things. For example, an interrupt may be occurring at this very moment, 
changing the registers needed for resumption. 

(Special cases of instruction dispatch 117 ) += 
case resume', if [cool 7 ^ oldjiot) goto stalT, 
instjptr = specval{^g[cool-zz ? rWW : rW]); 
if {-^{cool^loc.h sign.bit)) { 

if (cool^zz) cool~>interrupt |=K_BIT; 

else if {instjptr .o.h k. sign J)it) cool-'interrupt |= P_BIT; 

} 

if {cool^interrupt) { 

insRptr.o = incr{cool^loc,4)-, coolm = noop-, 

} else { 

cool->go.o = inst.ptr.o; 
if ( cool-^zz ) { 

(Magically do an I/O operation, if cool-'loc is rT 372 ); 
cool^rema = true, spec_install{kg[rK], kcool^a); 
cool-a. known = true, cool~>a.o = y[255].o; 
cool-^renjE = true, specjinstall{kg^I)S\,kcool-'x)', 
cool-w.known = true, cool-ec.o = g[rBB].o', 

} 

cool-‘b = specval{kg[cool^zz ? rXX : rX]); 

if {-i{coolX).o.h k signjiit)) ( Resume an interrupted operation 323 ); 

} break; 

323. Here we set cool-i = resum, since we want to issue another instruction after 
the RESUME itself. 

The restrictions on inserted instructions are designed to ensure that those instruc- 
tions will be the very next ones issued. (If, for example, an incgamma instruction 
were necessary, it might cause a page fault and we’d lose the operand values for 
RESUME_SET or RESUME_CONT.) 

A subtle point arises here: If RESUME_TRANS is being used to compute the page 
translation of virtual address zero, we don’t want to execute the dummy SWYM in- 
struction from virtual address —4! So we avoid the SWYM altogether. 

( Resume an interrupted operation 323 ) = 

{ 

cool^xx = cool^b.o.h 24, cool-i = resum-, 
head-loc = incr {instjptr . 0 , —4); 
switch {cool-’xx) { 

case RESUME_SET: cool^b.o.l = (SETH < 24) -|- {cool^b.o.l k ’^ffOOOO); 
head-interrupt |= cool-b.o.h k*lf00', 
resuming = 2; 

case RESUME_CONT : resuming -|-= 1 + cool-zz ; 

if {{{cool-b.o.l 24)&*fa) 7 ^ *b8) { /* not syncd or syncid *j 

m = cool-b.o.l 28; 
if ((1 <C m) & *8f 30) goto bad.resume; 
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m = (cool-'b.o.l 3 > 16) & *ff ; 

if (m > cooLL Am < cooLG) goto badjresume', 

} 

case RESUME_AGAIN : resume.again-. head-'inst = cool^b.o.l; 
m — head-'inst 24; 

if (m = RESUME) goto badjresume', /* avoid uninterruptible loop */ 
if {-icool-'zz Am > RESUME Am < SYNC A (head-'inst & bad^insEmask[m — RESUME])) 
head-'interrupt |= B_BIT; 
head-'noted = false-, break; 

case RESUME_TRANS : if (cool-'zz) { 

cool^ = speeval(&ig[rYY]), cool-'z = specval [&ig[rZZ]); 

if {{eool^b.o.l 3> 24) 7 ^ SWYM) goto resume^again-, 

cool-d = resume-, break; /* see “subtle point” above */ 

} 

default: badjresume-. cool-'interrupt \— B_'Bn , coolm = noop-, 
resuming = 0 ; break; 

} 

} 

This code is used in section 322. 



a: specnode, §44. 
b: spec, §44. 

B_BIT = 1 < 2, §54. 

badAnstjmask: int [], §305. 

cool: control *, §60. 

cooLG: int, §99. 

cooLL: int, §99. 

false = 0, §11. 

g: specnode [], §86. 

go: specnode, §44. 

h: tetra, §17. 

head: fetch §69. 

i: inter nal.opcode, §44. 

incgamma = 84, §49. 

incr: octa (), mmix-ARITH §6. 

inst: tetra, §68. 

instjptr: spec, §284. 

interrupt: unsigned int, §44. 

interrupt: unsigned int, §68. 

K_BIT = 1 < 3, §54. 

known: bool, §40. 

1: tetra, §17. 



loc: octa, §44. 
loc: octa, §68. 
m: register int, §12. 
noop = 81, §49. 
noted: bool, §68. 
o: octa, §40. 
old.hot: control *, §60. 
P_BIT = 1 < 0, §54. 
rBB = 7, §52. 
ren.a: bool, §44. 
ren.x: bool, §44. 
resum = 89, §49. 
resume = 76, §49. 

RESUME = 9, §47. 

RESUME. AGAIN =0, §320. 
RESUME_C0NT = 1, §320. 
RESUME_SET = 2, §320. 
RESUME.TRANS = 3, §320. 
resuming: int, §78. 
rK = §52. 

rW =24, §52. 



rWW =28, §52. 
rX =25, §52. 
rXX =29, §52. 
rYY =30, §52. 
rZZ = 31, §52. 

SETH = #e0, §47. 
sign.bit = macro, §80. 
specXnstall: static void (), 
§95. 

specval: static spec (), §93. 
stall: label, §75. 

SWYM = #fd, §47. 

SYNC = ^fc, §47. 
syncd = 64, §49. 
syncid = 65, §49. 
true = 1, §11. 
x: specnode, §44. 
xx: unsigned char, §44. 
y: spec, §44. 

.z: spec, §44. 

zz: unsigned char, §44. 
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324 . ( Insert special operands when resuming an interrupted operation 324 ) = 

{ 

if {resuming & 1) { 

cool-^ = specval{&^g[rY]); 
cool-'z = specval{&ig[rZ])-, 

} else { 

cool-^ = specval{&Lg[rYY])\ 
cool-z = specval{&^g[rZZ])-, 

} 

if {resuming > 3) { /* RESUME_SET */ 

cool->needjra = true, cool~>ra = specval{&ig[rA]); 

} 

cool^usage = false ; 

} 

This code is used in section 103. 

325 . :j5^define do.resume.trans 17 

/* state for performing RESUME_TRANS actions */ 

( Cases for stage 1 execution 155 ) += 

case resume: case resum: if {data-xx RESUME_TRANS ) goto fin.ex; 
data-ptr.a — (void *){{data~‘b.o.l 24) = SWYM ? ITcache : DTcache)\ 

data^state = dojresume-trans-, 

data^z.o = incr{oandn{data~'z.o,page.mask), data-z.o.l & 7); 
data^z.o.h &= 
goto resume.trans; 

326 . ( Special cases for states in the first stage 266 ) += 
case dojresumeArans : resume.trans : 

{ register cache *c = (cache *) data->ptr.a\ 
if {(r'lock) wait{l)-, 
if {c- filler .next) wait{l); 
p = alloc.slot {c, trans-key {data->y .o))\ 
if (P) { 

cr*filler_ctl .ptr.b — (void *) p; 

c-* filler. ctl .y .o = data^.o\ 

c-> filler. ctl .b.o = data->z.o; 

cr^ filler. ctl .state = 1; 

schedule {&zc- filler , c-access.time, 1); 

} 

goto fin.ex', 

} 



301 



MMIX-PIPE: ADMINISTRATIVE OPERATIONS 



327. Administrative operations. The internal instructions that handle the 
register stack simply reduce to things we already know how to do. (Well, the internal 
instructions for saving and unsaving do sometimes lead to special cases, based on 
data-'op] for the most part, though, the necessary mechanisms are already present.) 

( Cases for stage 1 execution 155 } += 

case noop: if {data-interrupt F_BIT) goto emulate.virt; 
case inert: case unsave: goto fin.ex; 
case jmp: case pushj: data-go.o — data-z.o\ 
goto fin.ex-, 

case sav: if {-i(data-‘memzc)) goto fin.ex; 
case inegamma: case save: data-"! = st-, 
goto switchl ; 

case deegamma: case unsav: data-d = ld\ 
goto switchl ; 

328. We can GET special registers > 21 (that is, rA, rF, rP, rW-rZ, or rWW-rZZ) 
only in the hot seat, because those registers are implicit outputs of many instructions. 

The same applies to rK, since it is changed by TRAP and by emulated instructions. 
Likewise, rQ must not be prematurely gotten. 

( Cases for stage 1 execution 155 ) += 

case get: if {data^zz > 21 V data^zz = rK V data-‘zz = rQ) { 
if {data ^ oldjiot) wait{l)-, 
data-‘z.o = g\data->zz\.o\ 

} 

data~*x.o = data-*z.o-, goto fin.ex; 



access.time: int, §167. 
alloc^slot: static cacheblock 

*0, §205. 
b: spec, §44. 
cache = struct, §167. 
cool: control *, §60. 
data: register control *, 
§124. 

deegamma = 85, §49. 

DTcache: cache §168. 
emulate.virt: label, §310. 

F_BIT = 1 < 17, §54. 

false = 0, §11. 

filler: coroutine, §167. 

filler^ctl: control, §167. 

fin.ex: label, §144. 

q: specnode [1, §86. 

c/ei = 54, §49. 

go: specnode, §44. 

h: tetra, §17. 

i: internal.opcode, §44. 

inegamma = 84, §49. 

incr: octa (), mmix-ARITH §6. 

incrl = 86, §49. 

interrupt: unsigned int, §44. 
ITcache: cache *, §168. 



jmp = 80, §49. 

1: tetra, §17. 

Id = 56, §49. 
lock: lockvar, §167. 
mem.x: bool, §44. 
need^ra: bool, §44. 
next: coroutine *, §23. 
noop = 81, §49. 
o: octa, §40. 
oandn: octa (), 
MMIX-ARITH §25. 
oldJiot: control *, §60. 
op: mmix.opcode, §44. 
p: register cacheblock *, 
§258. 

page^mask: octa, §238. 
ptr^a: void *, §44. 
ptrj): void *, §44. 
pushj = 71, §49. 
rA = 21, §52. 
ra: spec, §44. 
resum = 89, §49. 
resume = 76, §49. 
RESUME_SET = 2, §320. 
RESUME_TRANS = 3, §320. 
resuming: int, §78. 



rK = 15, §52. 
rQ = 16, §52. 
rY =2Q, §52. 
rYY =30, §52. 
rZ = 27, §52. 
rZZ = 31, §52. 
sav = 87, §49. 
save = 77, §49. 

schedule: static void (), §28. 
speeval: static spec (), §93. 
st = 63, §49. 
state: int, §44. 
switchl : label, §130. 

SWYM = #fd, §47. 
transJzey = (), §240. 

true = 1, §11. 
unsav = 88, §49. 
unsave = 78, §49. 
usage: bool, §44. 
wait =mdiCro (), §125. 
x: specnode, §44. 

XX : unsigned char, §44. 
y: spec, §44. 

.z: spec, §44. 

zz: unsigned char, §44. 
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329. A PUT is, similarly, delayed in the cases that hold dispatch Jock . This program 
does not restrict the 1 bits that might be PUT into rQ, although the contents of that 
register can have drastic implications. 

( Cases for stage 1 execution 155 } += 

case put: if {data-xx = 8 V {data-xx > 15 A data-xx < 20)) { 
if {data 7 ^ oldJiot) wait{l)-, 
switch (data-'xx) { 

case rV: ( Update the paje variables 239 ); break; 

case rQ: new-Q.h |= data-^z.o.h &i '^g[rQ].o.h-, new.Q.l |= data-z.o.l & ~g[r(3].o.Z; 

data-z.o.l \~ new-Q.l', data-z.o.h \— new-Q.h\ break; 
case rL: if {data-z.o.h 7 ^ 0) data-z.o.h — 0, data-z.o.l = g[rL].o.l; 
else if {data-z.o.l > g[rL].o.l) data-z.o.l — g[rL].o.l\ 

default: break; 

case rG: (Update rG 33o); break; 

} 

} else if {data-xx = rA A {data-z.o.h 7 ^ 0 V data-z.o.l > *40000)) 
data-interrupt |= B_BIT, data-z.o.h = 0, data-z.o.l &= *3ffff ; 
data-x.o = data-z.o\ goto fin-ex; 

330. When rG decreases, we assume that up to commit-max marginal registers can 
be zeroed during each clock cycle. (Remember that we’re currently in the hot seat, 
and holding dispatch-lock.) 

{ Update rG 330 ) = 

if {data-z.o.h 7 ^ 0 V data-z.o.l > 256 V data-z.o.l < g[rL].o.l V data-z.o.l < 32) 
data-interrupt [= B_BIT, data-z.o = ^[rGJ.o; 
else if {data-z.o. I < g[rG].o.l) { 

data-interim = true; /* potentially interruptible */ 
for {j = 0; j < commit-max; j’++) { 
g[rG].o.l — ; 

g[g[rG].o.l].o = zero-octa; 
if {data-z.o.l = g[rG].o.l) break; 

} 

if {j = commit-max) { 

if {-itrying-to -interrupt) uiait{l); 

} else data-interim = false; 

} 

This code is used in section 329. 

331. Computed jumps put the desired destination address into the go field. 

( Cases for stage 1 execution 155 ) += 

case go: data-x.o = data-go.o; goto add-go; 

case pop: data-x.o = data-y.o; 

data-y.o = data-b.o; /* move rj to y field */ 
case pushgo: add-go: data-go .o = oplus {data-y.o, data-z.o); 

if {{data-go .o.h sign-bit) A ~^{data-loc.h &i sign-bit)) data-interrupt |= P_BIT; 
data- go. known = true; goto fin-ex; 
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332. The instruction UNSAVE z generates a sequence of internal instructions that ac- 
complish the actual unsaving. This sequence is controlled by the instruction currently 
in the fetch buffer, which changes its X and Y fields until all global registers have been 
loaded. The first instructions of the sequence are UNSAVE 0, 0, z; UNSAVE 1, rZ, z — 8; 
UNSAVE l,rY,z-16; . . . ; UNSAVE l,rB,z-96] UNSAVE 2, 255, z- 104; UNSAVE 2,254,z- 
112; etc. If an interrupt occurs before these instructions have all been committed, the 
execution register will contain enough information to restart the process. 

After the global registers have all been loaded, UNSAVE continues by acting rather 
like POP. An interrupt occurring during this last stage will find rS < rO; a context 
switch might then take us back to restoring the local registers again. But no infor- 
mation will be lost, even though the register from which we began unsaving has long 
since been replaced. 

(Special cases of instruction dispatch 117 ) -|-= 

case unsave: if {cool^interrupt cool-n = noop\ 

else { 

cool->interim = true-, 

op = LDOU; /* this instruction needs to be handled by load/store unit */ 

cool-n = unsav; 

switch {cool^xx) { 

case 0: if {cool-’z.p) goto stall-, 

{ Set up the first phase of unsaving 334); break; 
case 1: case 2; ( Generate an instruction to unsave 333); break; 

case 3: cool-n = unsave, cool-interim = false, op = UNSAVE; 
goto pop. unsave-, 

default: cool-interim = false, cool-i = noop, cool-interrupt |= break; 

} 

} 

break; /* this takes \is to dispatch.done */ 



b: spec, §44. 

B_BIT = 1 < 2, §54. 
commit.max: int, §59. 
cool: control *, §60. 
data: register control *, 
§124. 

dispatch.done: label, §101. 
dispatch.lock: lockvar, §65. 
false = 0, §11. 
fin.ex: label, §144. 
g: specnode [], §86. 
go =72, §49. 
go: specnode, §44. 
h: tetra, §17. 
i: internal.opcode, §44. 
interim: bool, §44. 
interrupt: unsigned int, §44. 
j: register int, §12. 
known: bool, §40. 



1: tetra, §17. 

LD0U = =^8e, §47. 
loc: octa, §44. 
new.Q: octa, §148. 
noop = 81, §49. 
o: octa, §40. 
old.hot: control *, §60. 
op: mmix.opcode, §44. 

Oplus: octa (), MMIX-ARITH §5. 
p: specnode *, §40. 

P_BIT = 1 < 0, §54. 
pop = 75, §49. 
pop^unsave: label, §120. 
pushgo = 74, §49. 
put = 55, §49. 
rA = 21, §52. 
rG = 19, §52. 
rL = 20, §52. 



rQ = 16, §52. 
rV = 1S, §52. 
sign.bit = macro, §80. 
stall: label, §75. 
true = 1, §11. 

trying J-ojinterrupt: bool, 
§315. 

unsav = 88, §49. 

UNSAVE = ^fb, §47. 
unsave = 78, §49. 
wait = macro (), §125. 
x: specnode, §44. 

XX : unsigned char, §44. 
y: spec, §44. 
yy: unsigned char, §44. 
2 :: spec, §44. 
zero.octa: octa, 
MMIX-ARITH §4. 



MMIX-PIPE: ADMINISTRATIVE OPERATIONS 



304 



333. (Generate an instruction to unsave g[yy] 333) = 
cool-reri-X = true, specAnstall{&ig[cool->yy],&icool-x)\ 
new-0 = new.S = incr{cooLO , —1)\ 

cool-^z.o = shiftJeft{new.0 , 3); 
cool->ptr.a — (void *) mem. up; 

This code is used in section 332. 

334. ( Set up the first phase of unsaving 334 ) = 
cool-reu-X = true, spec.install{&ig[rG], &iCool-x); 
cool^ren.a = true, spec-install{&ig[rA], &icool->a); 
new-0 = new.S = shift jright{cool^z.o,Z, 1); 
cool^setJ = true , spec.install{&ig[rL], &LCOol^rl); 
cool^ptr.a = (void *) mem. up; 

This code is used in section 332. 

335. ( Get ready for the next step of UNSAVE 335 ) = 
switch [cool-xx) { 

case 0: head-inst = packJ)ytes{}I^5kVE.,l,rZ ,0); break; 
case 1: if {cool^yy = rP) head-inst = pacA:_6?/tes (UNSAVE, 1, rTJ, 0); 
else if {cool~>yy = 0) head-^inst = pack.bytes{UNSME, 2, 255,0); 
else head-inst = pack.bytes {UNSAVE , 1, cool-yy — 1,0); break; 
case 2: if {cool-yy = cooLG) head-inst = pack.bytes {UNSAVE, Z, 0,0); 
else head-inst = paek.bytes {UNSAVE, 2, cool-yy — 1,0); break; 

} 

This code is used in section 81. 

336. (Handle an internal UNSAVE when it’s time to load 336) = 
if {data-xx = 0) { 

data-a.o — data-x.o; data-a.o.h &= /* unsaved rA */ 

data-x.o.l — data-x.o. h 24; data-x.o.h = 0; /* unsaved rG */ 

if {data-a.o.hV {data-a.o. I k,*ffic0000)) { 

data-a.o.h = 0, data-a.o.l &= data-interrupt |= B_BIT; 

} 

if {data-x.o.l < 32) { 

data-x.o.l = 32; data-interrupt \ = B_BIT; 

} 

} 

goto fin.ex; 

This code is used in section 279. 

337. Of course SAVE is handled essentially like UNSAVE, but backwards. 
(Special cases of instruction dispatch 117 ) += 

case save: if {cool-xx < cooLG) cool-interrupt |=B_BIT; 
if {cool-interrupt &lE_ETI:) cool-i = noop; 
else if {{{cooLS .1 — cooLO.l — cooLL — 1) & Iringjmask) = 0) 

(Insert an instruction to advance gamma 113 ) 
else { 

cool-interim = true; 
cool-i = sav; 
switch {cool-zz) { 
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case 0: (Set up the first phase of saving 338); break; 

case 1: if {cooLO.l 7 ^ cooLS .1) (Insert an instruction to advance gamma 113 ) 
cool->zz = 2; cool^yy = cooLG\ 

case 2: case 3: (Generate an instruction to save g[yy] 339); break; 
default: cool-interim = false, cool-i = noop, cool-interrupt |= B_BIT; break; 

} 

} 

break ; 

338. If an interrupt occurs during the first phase, say between two incgamma 
instructions, the valne cool^zz = 1 will get things restarted properly. (Indeed, if 
context is saved and nnsaved dnring the interrupt, many incgamma instructions may 
no longer be necessary.) 

( Set up the first phase of saving 338 ) = 
cool-^zz = 1; 

cool^remx = true, specAnstall{Szl[{cooLO .1 + cooLL) & Iring.mask], Szcool'^x)] 
cool~*x. known = true, cool-^x.o.h = 0, cool~*x.o.l = cooLL] 
cool~*setJ = true,spec.install{Szg[rL],Szcool^rl); 
new.O = incr{cooLO , cooLL + 1); 

This code is used in section 337. 

339. (Generate an instruction to save g[yy] 339 ) = 

op = STOU; /* this instruction needs to be handled by load/store unit */ 
cool-*mem.x = true,spec.install{Szmem,Szcool~^x)', 
cool-^z.o = shiftJeft{cooLO ,S); 
new-0 = new-S = incr{cooLO ,1)\ 

if {cool^zz = 3 A cool^yy > rZ) ( Do the final SAVE 340 } 
else cool^b = specval (Szg[cool^yy])-, 

This code is used in section 337. 



a: specnode, §44. 
b: spec, §44. 

B_BIT = 1 < 2, §54. 
cool: control *, §60. 
cooLG: int, §99. 
cooLL: int, §99. 
cooLO: octa, §98. 
cooLS: octa, §98. 
data: register control *, 
§124. 

false = 0, §11. 

fin.ex: label, §144. 

g: specnode [], §86. 

h: tetra, §17. 

head: fetch §69. 

i: internal.opcode, §44. 

incgamma = 84, §49. 

incr: octa (), mmix-ARITH §6. 

inst: tetra, §68. 

interim: bool, §44. 

interrupt: unsigned int, §44. 



known: bool, §40. 

1 : tetra, §17. 

1 : specnode *, §86. 
Iring.mask: int, §88. 
mem: specnode, §115. 
mem.x: bool, §44. 
new.O: octa, §99. 
new.S: octa, §99. 
noop = 81, §49. 
o: octa, §40. 

op: register mmix.opcode, 

§75. 

pack.hytes = T[iQ.cxo (), §320. 

ptr^a: void *, §44. 

rA = 21, §52. 

ren.a: bool, §44. 

ren.x: bool, §44. 

rG = 19, §52. 

rl: specnode, §44. 

rL = 20, §52. 

rP = 23, §52. 



rR = 6 , §52. 
rZ = 27, §52. 
sav = 87, §49. 
save = 77, §49. 
setJ: bool, §44. 
shift Jeft: octa (), 
MMIX-ARITH §7. 
shift.right: octa {), 
MMIX-ARITH §7. 
spec^install: static void (), 
§95. 

specval: static spec (), §93. 
ST0U = ^ae, §47. 
true = 1, §11. 

UNSAVE = ^fb, §47. 
up: specnode *, §40. 
x: specnode, §44. 

XX : unsigned char, §44. 
yy: unsigned char, §44. 
spec, §44. 

zz: unsigned char, §44. 
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340. The final SAVE instruction not only stores rG and rA, it also places the final 
address in global register X. 

( Do the final SAVE 340 ) = 

{ 

cool~n = save', 
cool^interim — false ; 

cool^ren.a = true , specAnstall{&ig[cool-'xx], &icool->a)', 

} 

This code is used in section 339. 

341. ( Get ready for the next step of SAVE 341 } = 
switch (cool^zz) { 

easel: head-^inst = pack.bytes {SAVE, cool-*xx, 0,1); break; 

case 2: if {cool-yy = 255) head-inst — packJ}ytes{SkVE, cool-xx ,0,5); 

else head-inst = pack Jyytes {SAVE, cool-xx, cool-yy + 1, 2); break; 
case 3: if {cool-'yy = rR) head-inst = pack.bytes {SAVE, cool->xx, rP , 3); 
else head^inst = pack Jyytes {SAVE, cool-^xx, cool^yy + 1, 3); break; 

} 

This code is used in section 81. 

342. (Handle an internal SAVE when it’s time to store 342) = 

{ 

if {data-interim) data->x.o = data->b.o; 
else { 

if {data ^ oldjiot) wait{l); /* we need the hottest value of rA */ 
data-x.o.h = g[rG\.o.l ^ 24; 
data^x.o.l = g[rA].o.l; 
data^a.o = data^y.o; 

} 

goto fin.ex; 

} 

This code is used in section 281. 
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343. More register-to-register ops. Now that we’ve finished most of the hard 
stuff, we can relax and fill in the holes that we left in the all-register parts of the 
execution stages. 

First let’s complete the fixed point arithmetic operations, by dispensing with mul- 
tiplication and division. 

( Cases to compute the results of register-to-register operation 137 ) H-= 
case mulu\ data~*x.o = omult{data^y.o^ data~*z.o)\ 
data^a.o = aux\ 
goto quantify. mul\ 

case mul: data-^x.o = signed.omult{data-*y.o, data-*z.o)] 
if (overflow) data-^interrupt |=V_BIT; 
quantify.mul: aux = data~*z.O] 

for (j = mul0\ aux.lW aux.h; j++) aux = shift.right (aux 
data-d = j; break; /* j is mulO or mull or . . . or mul8 */ 
case divu: data-^x.o = odiv(data^b.o, data~*y.o, data-*z.o); 

data~*a.o = aux\ data-d = div] break; 
case div. if (data-*z.o.l = 0 A data~*z.o.h = 0) { 

data~*interrupt |= D_BIT; data~*a.o = data^.o; 

data~d = set; /*■ divide by zero needn’t wait in the pipeline */ 

} else { 

data-^x.o = signed.odiv(data^.o, data-*z.o)\ 
if (overflow) data-*interrupt |= V_BIT; 
data~*a.o = aux\ 

} break; 



a: specnode, §44. 

aux\ octa, mmix-ARITH §4. 

h: spec, §44. 

cool: control *, §60. 

D_BIT = 1 < 15, §54. 
data: register control *, 
§124. 

div = 9, §49. 

divu = 28, §49. 

false = 0, §11. 

fin.ex: label, §144. 

g: specnode [], §86. 

h: tetra, §17. 

head: fetch §69. 

i: internal.opcode, §44. 

inst: tetra, §68. 

interim: bool, §44. 

interrupt: unsigned int, §44. 

j: register int, §12. 

1: tetra, §17. 



mul = 26, §49. 
mulO = 0, §49. 
mull = 1, §49. 
mul8 = 8, §49. 
mulu = 27, §49. 
o: octa, §40. 

odiv: octa (), MMIX-ARITH §13. 
old.hot: control *, §60. 
omult: octa (), 

MMIX-ARITH §8. 
overflow: bool, 

MMIX-ARITH §4. 
pack.butes = m&CTO (), §320. 
rA = 21, §52. 
ren.a: bool, §44. 
rG = 19, §52. 
rP = 23, §52. 
rR = 6, §52. 
save = 77, §49. 



SAVE = #fa, §47. 
set = 33, §49. 
shift.right: octa (), 
MMIX-ARITH §7. 
signed.odiv: octa (), 
MMIX-ARITH §24. 
signed.omult: octa (), 
MMIX-ARITH §12. 
spec.install: static void (), 
§95. 

true = 1, §11. 

V_BIT = 1 < 14, §54. 
wait = macro (), §125. 
x: specnode, §44. 

XX : unsigned char, §44. 

y: spec, §44. 

yy: unsigned char, §44. 

2 :: spec, §44. 

zz: unsigned char, §44. 
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344. Next let’s polish off the bitwise and bytewise operations. 

( Cases to compute the results of register-to-register operation 137 ) += 
case sadd: 

data-x.o.l = count.bits(data-*y.o.h &: ^data-z.o.h) + countj>its{data-^.o.l k, ^data-*z.o.l)\ 
break; 

case mor: data-x.o = booLmuH{data^.o, data^z.o, data^op k*2)-, break; 
case bdif: data-x.o.h = byte.diff {data-^.o.h, data-^z.o.h)-, 
data-x.o.l = byte.diff {data-y .o.l , data-z.o.l)-, break; 
case wdif: data-x.o.h — wyde-diff (data-y.o.h, data-z.o.h)\ 
data-x.o.l = wyde.diff {data-y.o.l, data-z.o.l)-, break; 
case tdif- if {data-y.o.h > data-z.o.h) data-x.o.h = data-y.o.h — data-z.o.h-, 
tdifj-. if [data-y .0.1 > data-z.o.l) data-x.o.l = data-y.o.l — data-z.o.h, break; 
case odif: if {data-y.o.h > data-z.o.h) data-x.o = ominus{data-y.o, data-z.o)-, 
else if {data-y.o.h = data-z.o.h) goto tdifj\ 
break; 

345. The conditional set (CS) instructions are, rather surprisingly, more difficult to 
implement than the zero set (ZS) instructions, although the ZS instructions do more. 
The reason is that dynamic instruction dependencies are more complicated with CS. 
Consider, for example, the instructions 

LDD x,a,b; FDIV y,c,d; CSZ y,x,0; INCL y,l. 

If the value of x is zero, the INCL instruction need not wait for the division to be 
completed. (We do not, however, abort the division in such a case; it might invoke 
a trip handler, or change the inexact bit, etc. Our policy is to treat common cases 
efficiently and to treat all cases correctly, but not to treat all cases with maximum 
efficiency.) 

( Cases to compute the results of register-to-register operation 137 ) -|-= 
case zset: if {register J,ruth{data-y.o, data- op)) data-x.o = data-z.o-, 

/* otherwise data-x.o is already zero */ 
goto fin.ex; 

case cset: if {register .truth {data-y.o, data- op)) data-x.o — data-z.o, data-b.p = A-, 
else if {data-b.p = A) data-x.o — data-b.o; 
else { 

data-state = 0; data-need.b — true-, goto switchl ; 

} break; 

346. Floating point computations are mostly handled by the routines in MMIX- 
ARITH, which record anomalous events in the global variable exceptions. But we 
consider the operation trivial if an input is infinite or NaN; and we may need to 
increase the execution time when subnormals are present. 

#deflne RQUND_0FF 1 
#deflne RQUND_UP 2 
#deflne RQUND_D0WN 3 
#deflne RDUND_NEAR 4 

^define is. subnormal {x) {{x.h k*7ft00000) = Q f\ {{x.h k*tftft) x.l)) 

:j^deflne is.trivial{x) {{x.h k*7fi00000) = *7fi00000) 

T^deflne set.round cur.round = {data-ra.o.l < *10000 ? R0UND_NEAR : data-ra.o.l 3> 16) 
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{ Cases to compute the results of register-to-register operation 137 ) += 
case f add-. seEround- data-*x.o = fplus{data-y.o, data-*z.o); 
fin.bflot-. if [is subnormal {data-^.o)) data-’denin = denin.penalty ■ 
fin.uflot: if [is subnormal {data^x.o)) data^denout = denout.penalty 
fimflot-. if (issubnormal(data-z.o)) data-denin = denimpenalty, 
data-’interrupt |= exceptions-, 

if {is.trivial[data-*y.o) V is.trivial{data-*z .6)) goto fin.ex; 
if {data-d = fsqrt A {data~*z.o.h & sign.bit)) goto fin.ex] 
break; 

case fsub: data~^a.o = data^z.o-, 

if {fcomp{data^z.o, zero.octa) ^ 2) data~*a.o.h sign.bit; 
set.round; data~*x.o = fplus{data-y.o, data-*a.o)] 
data^ = fadd; /* use pipeline times for addition */ 
goto fin.bflot] 

case fmul: set.round', data~*x.o = fmult(data^.o, data^z.o)-, goto 
case fdiv: set.round\ data~*x.o = fdivide{data-*y.o^ data^z.o)\ goto 
case fsqrt: set.round; data-*x.o = froot{data~*z.o, data-*y.o.l)\ goto fin.uflot] 
case fint: set.round; data-^x.o = fintegerize{data~^z.o, data~^.o.l); goto fin.uflot; 
case fix: set.round; data~*x.o = fixit{data-*z.o^ data~*y.o.l)\ 

if [data-^op ^*2) exceptions &= /* unsigned case doesn’t overflow */ 

goto fin.flot] 

case flot: set.round\ data^x.o = floatit{data~^z.o, data^.o.l, data~*op & ^2, data^op & ^^4); 
data-^interrupt |= exceptions] break; 

347 . (Special cases of instruction dispatch 117 ) += 

case fsqrt: case fint: case fix: case flot: if {cool^.o.l > 4) goto illegaLinst] 

break; 



a: specnode, §44. 
b: spec, §44. 
bdif = 48, §49. 
booLmult: octa {), 
MMIX-ARITH §29. 
byte-diff: tetra (), 
MMIX-ARITH §27. 
cool: control *, §60. 
count J)its: int (), 

MMIX-ARITH §26. 
cset = 53, §49. 
curjTound: int, 

MMIX-ARITH §30. 
data: register control *, 

§124. 

denin: int, §44. 
deninjpenalty : int, §349. 
denout: int, §44. 
denout^penalty : int, §349. 
exceptions: int, 

MMIX-ARITH §32. 
fadd = 14, §49. 

fcomp: int (), mmix-arith §85. 
fdiv = IQ, §49. 
f divide: octa (), 

MMIX-ARITH §44. 



fin^ex: label, §144. 
fint = 18, §49. 
fintegerize: octa (), 
MMIX-ARITH §86. 

fix = l^, §49. 

fixit: octa (), MMIX-ARITH §88. 
floatit: octa (), 

MMIX-ARITH §89. 
flot =20, §49. 
fmul = 15, §49. 
fmult: octa (), 

MMIX-ARITH §41. 
fplus: octa (), 

MMIX-ARITH §46. 
froot: octa (), 

MMIX-ARITH §91. 
fsqrt = 17, §49. 
fsub = 24:, §49. 
h: tetra, §17. 
i: internaLopcode, §44. 
illegaLinst: label, §118. 
interrupt: unsigned int, §44. 

1: tetra, §17. 
mor = 13, §49. 
needjy. bool, §44. 



o: octa, §40. 
odif =bl, §49. 
ominus: octa (), 
MMIX-ARITH §5. 
op: mmix.opcode, §44. 
p: specnode *, §40. 
ra: spec, §44. 

register. truth: static int (), 
§157. 

sadd = 12, §49. 
sign.bit = macro, §80. 
state: int, §44. 
switchl : label, §130. 
tdif = 50, §49. 
true = 1, §11. 

W_BIT = 1 < 13, §54. 
wdif = 49, §49. 
wyde.diff: tetra (), 
MMIX-ARITH §28. 
x: specnode, §44. 
y: spec, §44. 

2 :: spec, §44. 
zero.octa: octa, 
MMIX-ARITH §4. 
zset = 52, §49. 
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348. (Cases to compute the results of register-to-register operation 137) += 
case feps\ j = fepscomp {data-y .o, data->z.o, data->b.o, data-op 7 ^ FEQLE); 

if (j = 2 ) dataH = fcmp; 

else if {is.subnormal(data^y.o) V is.subnormal{data-^z.o)) data-^denin = deninjpenalty \ 
switch (data^op) { 

case FUNE: if {j = 2) goto cmpjpos\ else goto cmp.zero; 

case FEQLE: goto cmp.fin; 

case FCMPE: if (j) goto cmp.zero.or. invalid; 

} 

case fcmp: j = fcomp{data-*y.o, data^z.o); 

if {j < 0 ) goto cmpjneg; 
cmp.fin: if (j = 1 ) goto cmp.pos; 

cmp.zero-or.invalid'. if {j = 2) data->interrupt [= I_BIT; 
goto cmp.zero; 

case funeq: if {fcomp{data^y.o, data-'z.o) = {data^op = FUN ? 2 : 0)) goto cmp.pos; 
else goto cmp-zero; 

349. (External variables 4} += 

Extern int frem.max; 

Extern int deninjpenalty, denouEpenalty ; 

350. The floating point remainder operation is especially interesting because it can 
be interrupted when it’s in the hot seat. 

( Cases to compute the results of register-to-register operation 137 ) -|-= 
case /rem: if {is.trivial(data^y.o) y is.trivial{data^z.o)) { 
data-’x.o = fremstep{data-*y.o, data^z.o, 2500); 
data^interrupt [= exceptions; goto fin.ex; 

} 

if {{self + l)-*next) wait{l); 
data-interim = true; 

j = 1; 

if {is-Subnormal{data-‘y.o) V is.subnormal{data^z.o)) j -|-= deninjpenalty; 

pasSjafter{j); 

goto passit; 



311 



MMIX-PIPE: MORE REGISTER-TO-REGISTER OPS 



351 . (Begin execution of a stage-two operation 351 ) = 

j = 1; 

if {data-n = frem) { 

data^x.o = fremstep{data-*y.o, data-^z.o,frem.max); 
if (exceptions &:E_Bn) { 
data-y.o — data-x.o\ 

if (trying. to. interrupt A data = old.hot) goto fin.ex- 
} else { 

data-’state = 3; 
data-interim = false ; 
data^interrupt \= exceptions-, 

if (is. subnormal (data-^x.o)) j -|-= denout.penalty, 

} 

wait(j)-, 

} 

This code is used in section 135. 



b: spec, §44. 
cmp^neg: label, §143. 
cmp^pos: label, §143. 
cmp.zero: label, §143. 
data: register control *, 

§124. 

denin: int, §44. 

E_BIT = 1 < 18, §54. 
exceptions: int, 

MMIX-ARITH §32. 

Extern = macro, §4. 
false = 0, §11. 
fcmp = 22, §49. 

FCMPE = =^11, §47. 

fcomp: int (), mmix-arith §85. 

feps = 21, §49. 

fepscomp: int (), 



MMIX-ARITH §50. 

FEQLE = ^13, §47. 
fin.ex: label, §144. 
frem = 25, §49. 
fremstep: octa (), 
MMIX-ARITH §93. 

FUN = #02, §47. 

FUNE = #12, §47. 
funeq = 23, §49. 
i: internal.opcode, §44. 
I_BIT = 1 < 12, §54. 
interim: bool, §44. 
interrupt: unsigned int, §44. 
is.subnormal = macro (), §346. 
is.trivial = ma.cYO (), §346. 
j: register int, §12. 



next: coroutine *, §23. 
o: octa, §40. 
old.hot: control *, §60. 
op: mmix_opcode, §44. 
pass.after = macro (), §125. 
passit: label, §134. 
self: register coroutine *, 
§124. 

state: int, §44. 
true = 1, §11. 

trying.to.interrupt: bool, 
§315. 

wait = macro (), §125. 
x: specnode, §44. 
y: spec, §44. 
spec, §44. 
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352. System operations. Finally we need to implement some operations for the 
operating system; then the hardware simulation will be done! 

A LDVTS instruction is delayed until it reaches the hot seat, because it changes 
the IT and DT caches. The operating system should use SYNC after LDVTS if the 
effects are needed immediately; the system is also responsible for ensuring that the 
page table permission bits agree with the LDVTS permission bits when the latter are 
nonzero. (Also, if write permission is taken away from a page, the operating system 
must have previously used SYNCD to write out any dirty bytes that might have been 
cached from that page; SYNCD will be inoperative after write permission goes away.) 

( Handle special cases for operations like prego and Idvts 289 ) += 
if {dataH = Idvts) (Do stage 1 of LDVTS 353 ); 

353. ( Do stage 1 of LDVTS 353 } = 

{ 

if {data 7 ^ oldjiot) wait{l)-, 

if {DT cache-lock V {j = getjreader{DTcache)) < 0) wait{l)-, 
startup {&iDT cache-reader [j], DTcache-accessDime)-, 
data-z.o.h — 0, data-z.o.l — data-y.o.l & 

p = cachesearch{DTcache, data-y.o)-, /* N.B.: Not trans-key {data— y.o) */ 

if (P) { 

data-x.o.l = 2; 
if {data-z.o.l) { 

p = us e.and.fix{DT cache, p)', 

p-data^].l = {p-data[0].l & —8) + data-z.o.l-, 

} else { 

p= demote.and. fix {DT cache, p)-, 

p-tag.h \ — sign.bit-, /* invalidate the tag */ 

} 

} 

pass-after{DTcache-accessDime)-, goto passit-, 

} 

This code is used in section 352. 

354. ( Special cases for states in later stages 272 ) += 

case Id^stJaunch-. if {ITcache-lock V {j = geEreader {IT cache)) < 0) wait{l); 
startup {&i IT cache-reader [j], ITcache-accessDime)-, 

p = cachets earch {IT cache, data-y.o)-, /* N.B.: Not trans.key {data-y.o) */ 

if (P) { 

data-x.ol |= 1; 
if {data-z.o.l) { 

p = use.and.fix{ITcache,p)-, 

p-data[0].l = {p-data[0]I & —8) + data-z.o.l-, 

} else { 

p — demote.and.fix {ITcache , p); 

p-tag.h 1= signj)it-, /* invalidate the tag */ 

} 

} 

data-state = 3; wait{ITcache-accessJ,ime)-, 
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355. The SYNC operation interacts with the pipeline in interesting ways. SYNC 0 
and SYNC 4 are the simplest; they just lock the dispatch and wait until they get to the 
hot seat, after which the pipeline has drained. SYNC 1 and SYNC 3 put a “barrier” into 
the write buffer so that subsequent store instructions will not merge with previous 
stores. SYNC 2 and SYNC 3 lock the dispatch until all previous load instructions have 
left the pipeline. SYNC 5, SYNC 6, and SYNC 7 remove things from caches once they 
get to the hot seat. 

(Special cases of instruction dispatch 117 ) += 
case sync: if {cool^zz > 3) { 

if {-^{cool-’loc.h sign.bit)) ^pto privileged jinst\ 
if {cool-^zz = 4) freeze.dispatch = true; 

} else { 

if (cool-'zz ^ 1) freeze.dispatch = true\ 

if {cool^zz &zl) cool^mem^x = true, spec-install{&i,mem, &ccool-*x)', 

} break; 

356. ( Cases for stage 1 execution 155 ) += 
case sync: switch {data^zz) { 

case 0: case 4: if {data ^ oldjiot) wait{l)-, 
halted = {data-^zz 7 ^ 0); goto fin.ex; 
case 2: case 3: (Wait if there’s an unfinished load ahead of us 357); 

release-lock ( self , dispatch-lock ) ; 
case 1; data^x.addr = zero-Octa\ goto fin-ex; 
case 5; if {data old-hot) wait{l); 

{ Clean the data caches 361 ); 
case 6: if {data old-hot) wait{l); 

{ Zap the translation caches 358 ) ; 
case 7; if {data old-hot) wait{l); 

( Zap the instruction and data caches 359 ) ; 

} 



access.time: int, §167. 
addr: octa, §40. 
cache.search: static 
cacheblock *(), §193. 
cool: control *, §60. 
data: register control *, 
§124. 

data: octa *, §167. 
demote.and.fix: static 
cacheblock *(), §199. 
dispatch^lock: lockvar, §65. 
DTcache: cache *, §168. 
fin.ex: label, §144. 
freeze.dispatch: register 
bool, §75. 

get^reader: static int (), §183. 
h: tetra, §17. 
halted: bool, §12. 
i: internal.opcode, §44. 
ITcache: cache *, §168. 



j: register int, §12. 

1: tetra, §17. 

Id^stJaunch = 7, §265. 

Idvts = 60, §49. 
loc: octa, §44. 
lock: lockvar, §167. 
mem: specnode, §115. 
mem.x: bool, §44. 
o: octa, §40. 
old.hot: control *, §60. 
p: register cacheblock *, 
§258. 

pass^after = macro ( ), §125. 
passit: label, §134. 
prego = 73, §49. 
privileged^inst: label, §118. 
reader: coroutine *, §167. 
releaseJock =m&CTO (), §37. 
self: register coroutine *, 



§124. 

sign.bit = macro, §80. 
spec^install: static void (), 
§95. 

startup: static void (), §31. 
state: int, §44. 
sync = 79, §49. 
tag: octa, §167. 
trans^key = ma.cYO (), §240. 
true = 1, §11. 
use.and.fix: static 
cacheblock *(), §196. 
ri)a 2 £= macro (), §125. 
x: specnode, §44. 
y: spec, §44. 

spec, §44. 
zero.octa: octa, 
MMIX-ARITH §4. 
zz: unsigned char, §44. 
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357. (Wait if there’s an unfinished load ahead of us 357) = 

{ 

register control *cc; 

for (cc = data; cc ^ hot; ) { 

cc = (cc = reorder J,op ? reorder.bot : cc + 1); 

if {cc^owner A (cc^ = Id V cc^ = Idunc V cc-d = pst)) wait(l); 

} 

} 

This code is used in section 356. 

358. Perhaps the delay should be longer here. 

( Zap the translation caches 358 ) = 

if [DTcache^lock V {j = getjreader [DT cache)) < 0) wait(l); 
startup [&iDT cache-dreader [j], DT cachc-> access J,ime ) ; 
setJock{self , DT cache-lock); 
zap^cache (DTcache); 

data-state = 10; wait{DTcache-access-time); 

This code is used in section 356. 

359. ( Zap the instruction and data caches 359 ) = 
if (-ilcache) { 

data-state = 11; goto switchl ; 

} 

if {Icache-lock V {j — get.reader {Icache)) < 0) wait{l); 
startup [&L I cache-reader [j], Icache-access-time ) ; 
setJock{self , Icache-lock); 
zap.cache (Icache); 

data-state = 11; wait [I cache- access-time); 

This code is used in section 356. 

360. ( Special cases for states in the first stage 266 ) += 
case 10: if {self -lockloc) *{self-lockloc) = A, self-lockloc = A; 

if [ITcache-lock V {j = getjreader{ITcache)) < 0) wait{l); 
startup {&L IT cache-reader [j], ITcache-accessDime); 
setJock{self , ITcache-lock); 
zap.cache (ITcache); 

data-state = 3; wait{ITcache-access-time); 
case 11: if (self-lockloc) *{self -lockloc) — A, self-lockloc = A; 
if (wbufJock) wait{l); 

write-head — write-tail, writc-ctl .state = 0; /* zap the write buffer */ 

if (-^Dcache) { 

data-state = 12; goto switchl ; 

} 

if (Dcache-lock V (j = get-reader (Dcache)) < 0) wait{l); 
startup (&I.D cache-reader [j] , Dcache-access-time ) ; 
set-lock (self , Dcache-lock); 
zap-cache (Dcache); 

data-state — 12; wait(Dcache-access-time); 
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case 12: if {self-'lockloc) *{self-*lockloc) — A,self-‘lockloc — A; 

if {-iScache) goto fin.ex', 

if (Scache-'lock) wait{l)- 

setJock{self , Scache-'lock)-, 

zap.cache (Scache); 

data-'state = 3; wait [Scache-' access -time)-, 

361. ( Clean the data caches 36i ) = 

if [self-'lockloc) ^[self-'lockloc) = A, self-'lockloc = A; 

( Wait till write buffer is empty 362 ) ; 
if [cleau-co .next V cleanjock) wait[l)-, 
set-lock [self , cleanJock)-, 

clean-ctl.i = sync; clean-ctl .state = 0 ; clean-ctl .x.o.h = 0 ; 

startup [&iclean-CO , 1); 

data-'state = 13; 

data-'interim — true; 

wait[l); 

This code is used in section 356. 

362. (Wait till write buffer is empty 362) = 
if [write-head 7 ^ write-tail) { 

if [-n speed-lock) set-lock [self , speed-lock); 
wait[l); 

} 

This code is used in sections 361 and 364. 

363. The cleanup process might take a huge amount of time, so we must allow it to 
be interrupted. (Servicing the interruption might, of course, put more stuff into the 
cache.) 

( Special cases for states in the first stage 266 ) += 
case 13: if {-^clean.co .next) { 

data^interim = false] goto /in.ex; /* it’s done! */ 

} 

if {trying. to. interrxLpt) goto fin.ex; /*- accept an interruption */ 
wait{l); 



access.time: int, §167. 
clean.co: coroutine, §230. 
clean.ctl: control, §230. 
clean.lock: lockvar, §230. 
control = struct, §44. 
data: register control *, 
§124. 

Dcache: cache *, §168. 

DTcache: cache *, §168. 

false = 0, §11. 

fin.ex: label, §144. 

get.reader: static int (), §183. 

h: tetra, §17. 

hot: control *, §60. 

i: internal.opcode, §44. 

Icache: cache *, §168. 

interim: bool, §44. 



ITcache: cache *, §168. 
j: register int, §12. 

Id = 56, §49. 

Idunc = 59, §49. 
lock: lockvar, §167. 
lockloc: coroutine **, §23. 
next: coroutine *, §23. 
o: octa, §40. 

owner: coroutine *, §44. 
pst = 66, §49. 

reader: coroutine *, §167. 
reorder.bot: control *, §60. 
reorder.top: control *, §60. 
Scache: cache *, §168. 
self: register coroutine *, 
§124. 

set.lock = macro (), §37. 



speed.lock: lockvar, §247. 
startup: static void (), §31. 
state: int, §44. 
switchl : label, §130. 
sync = 79, §49. 
true = 1, §11. 

trying.to.interrupt: bool, 
§315. 

ri)a 2 £= macro (), §125. 
wbuf.lock: lockvar, §247. 
write.ctl: control, §248. 
write.head: write.node *, 
§247. 

write.tail: write.node *, 
§247. 

x: specnode, §44. 
zap.cache: void {), §181. 
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364. Now we consider SYNOD and SYNCID. When control comes to this part of the 
program, data^.o is a virtual address and data-^z.o is the corresponding physical 
address; data-xx + 1 is the number of bytes we are supposed to be syncing; data^b.o.l 
is the number of bytes we can handle at once (either Icache-bb or Dcache-bb or 8192). 

We need a more elaborate scheme to implement SYNOD and SYNOID than we have 
used for the “hint” instructions PRELD, PREGO, and PREST, because SYNOD and SYNOID 
are not merely hints. They cannot be converted into a sequence of cache-block-size 
commands at dispatch time, because we cannot be sure that the starting virtual 
address will be aligned with the beginning of a cache block. We need to realize that 
the bytes specified by SYNOD or SYNOID might cross a virtual page boundary — possibly 
with different protection bits on each page. We need to allow for interrupts. And we 
also need to keep the fetch buffer empty until a user’s SYNOID has completely brought 
the memory up to date. 

( Special cases for states in later stages 272 ) -|-= 
do-syncid: data^state = 30; 
case 30: if (data ^ oldjiot) wait{l); 
if {-ilcache) { 

data->state = {data^loc.h &i signjiit ? 31 : 33); goto switch2\ 

} 

( Clean the I-cache block for data->z.o, if any 365 ); 

datastate = (dataMoc.h sign.bit ? 31 : 33); vjait{Icache-accessMime)’, 
case 31: if {self ^lockloc) *{selfMockloc) = A, selfMockloc — A; 

( Wait till write buffer is empty 362 } ; 

if {{{data-'b.o.l — 1) '^data-y.o.l) < data-xx) data-interim = true; 
if {-iDcache) goto nexEsync; 

(Clean the D-cache block for data-z.o, if any 366); 
data-state = 32; wait{Dcache-access.time); 
case 32: if (self-lockloc) *{self-lockloc) — A, self-lockloc = A; 
if {-iScache) goto nexEsync; 

(Clean the S-cache block for data-z.o, if any 367); 
data-state = 35; wait{Scache-access-time); 
do.syncd: data-state = 33; 
case 33: if {data ^ oldJiot) wait{l); 

if {self -lockloc) *{self -lockloc) = A, self -lockloc = A; 

( Wait till write buffer is empty 362 ) ; 

if {{{data-b.o.l — 1) '^data-y.o.l) < data-xx) data-interim = true; 
if {-iDcache) 

if {data-i = synod) goto fin^ex; else goto nexEsync; 

(Use cleanup on the cache blocks for data-z.o, if any 368); 
data-state = 34; 

case 34: if {-<clean^co .next) goto nexEsync; 

if {trying J.oJ,nterrupt A data-interim A data = oldjiot) { 
data-z.o = zero.octa; /* anticipate RESUME_C0NT */ 
goto fin.ex; /* accept an interruption */ 

} 

wait{l); 

nexEsync) data-state = 35; 
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case 35: if (selfMockloc) *{selfMockloc) = A, self-lockloc = A; 

if [data-’interim) (Continue this command on the next cache block 369); 
data-go .known = true; 
goto fin.ex; 

365. (Clean the I-cache block for data-’z.o, if any 365) = 
if {IcacheMock V {j = get-reader {Icache)) < 0) wait{l); 
startup {&ilcache~*reader [j], I cache- access -time); 
set-lock {self , leache-lock); 

p = cachesearch{Icache , data-z.o); 

if (P) { 

demotc-and-fix {I cache , p); 
clean-block {Icache , p) ; 

} 

This code is used in section 364. 

366. ( Clean the D-cache block for data-z.o, if any 366 ) = 
if {Dcache-lock V {j = get-reader {D cache)) < 0) wait{l); 
startup {^D cache-reader [j], Dcache-access-time); 
set-lock {self , Dcache-lock); 

p = cache-search {Dcache, data-z.o); 

if ip) { 

demote-and-fix {Dcache , p); 
clean-block {Dcache , p) ; 

} 

This code is used in section 364. 

367. (Clean the S-cache block for data-z.o, if any 367) = 
if {Scache-lock) wait{l); 

set-lock {self , Scache-lock); 
p = cache-search {Scache, data-z.o); 

if ip) { 

demote-and-fix {Scache , p) ; 
clean- block {Scache, p); 

} 

This code is used in section 364. 



access.time: int, §167. 
b: spec, §44. 
bb: int, §167. 
cache^search: static 
cacheblock *(), §193. 
clean.block: void (), §179. 
clean^co: coroutine, §230. 
cleanup =91, §129. 
data: register control *, 
§124. 

Dcache: cache *, §168. 
demote-and-fix: static 
cacheblock *(), §199. 
fin-ex: label, §144. 
get-reader: static int (), §183. 
go: specnode, §44. 
h: tetra, §17. 



i: internal.opcode, §44. 
Icache: cache *, §168. 
interim: bool, §44. 
j: register int, §12. 
known: bool, §40. 

1: tetra, §17. 
loc: octa, §44. 
lock: lockvar, §167. 
lockloc: coroutine §23. 
next: coroutine *, §23. 
o: octa, §40. 
old-hot: control *, §60. 
p: register cacheblock *, 
§258. 

reader: coroutine *, §167. 
RESUME_C0NT = 1, §320. 
Scache: cache *, §168. 



self: register coroutine *, 
§124. 

set-lock =vaaiCYO (), §37. 
sign-bit = macro, §80. 
startup: static void (), §31. 
state: int, §44. 
switch2: label, §135. 
syncd = 64, §49. 
true = 1, §11. 

trying-to-interrupt: bool, 
§315. 

wait = macro (), §125. 

XX : unsigned char, §44. 
y: spec, §44. 

2 :: spec, §44. 
zero-octa: octa, 
MMIX-ARITH §4. 
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368 . ( Use cleanup on the cache blocks for data-z.o, if any 368 ) = 
if {cleau-co .next V cleanJock) wait{l)\ 

setJock(self , cleanJock)-, 
clean-ctl.i = syncd-, 
clean.ctl .state = 4; 

clean.ctl .x.o.h = data-loc.h &i sign.bit; 
clean.ctl .z.o = data-z.o-, 
schedule {Ezclean.co ,1,4); 

This code is used in section 364. 

369 . We use the fact that cache block sizes are divisors of 8192. 

( Continue this command on the next cache block 369 ) = 

{ 

data~‘interim — false ; 

data->xx — = {{data-‘b.o.l — 1) & ^data-nj.o.l) + 1; 
data-’y.o — incr{data-*y.o, data^b.o.l)-, 
data-gj.o.l &= —data^b.o.l; 

data-’z.o.l = {data-z.o.l & —8192) + {data-y.o.l & 8191); 
if [[data-y.o.l &8191) = 0) goto square.one-, 

/* maybe crossed a page boundary */ 
if [data-n = syncd) goto do. syncd-, else goto do.syncid-, 

} 

This code is used in section 364. 

370 . If the first page lacks proper protection, we still must try the second, in the 
rare case that a page boundary is spanned. 

( Special cases for states in later stages 272 ) += 
sync.check: if {{data^.o.l © [data-*y.o.l + data^xx)) > 8192) { 
data^xx —= (8191 ^data-*y.o.l) + 1; 
data-qj.o = incr{data->y.o, 8192)-, 
data-*y.o.l &= —8192; 
goto square.one-, 

} 

goto fin.ex-, 
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371. Input and output. We’re done implementing the hardware, but there’s 
still a small matter of software remaining, because we sometimes want to pretend 
that a real operating system is present without actually having one loaded. This 
simulator therefore implements a special feature: If RESUME 1 is issued in location rT, 
the ten special I/O traps of MMIX-SIM are performed instantaneously behind the 
scenes. 

Of course all claims of accurate simulation go out the door when this feature is 
used. 

:?^define max.sys.call Ftell 

( Type definitions ii ) += 

typedef enum { 

Halt , Fopen , Fclose , Fread , Fgets , Fgetws , Fwrite , Fputs , Fputws , Fseek , Ftell 

} sys.call; 



b: spec, §44. 

clean.co: coroutine, §230. 
clean.ctl: control, §230. 
clean.lock: lockvar, §230. 
cleanup =91, §129. 
data: register control *, 
§124. 

do.syncd: label, §364. 
do.syncid: label, §364. 
false = 0, §11. 
fin.ex: label, §144. 



h: tetra, §17. 
i: internaLopcode, §44. 
incr: octa (), mmix-ARITH §6. 
interim: bool, §44. 

1 : tetra, §17. 
loc: octa, §44. 
next: coroutine *, §23. 
o: octa, §40. 

schedule: static void (), §28. 
self: register coroutine *, 
§124. 



set.lock = mdicro (), §37. 
sign.bit = macro, §80. 
square.one: label, §272. 
state: int, §44. 
syncd = 64, §49. 
wait = macro (), §125. 
x: specnode, §44. 

XX : unsigned char, §44. 
y: spec, §44. 
spec, §44. 
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372. (Magically do an I/O operation, if cool-^loc is rT 372 ) = 
if [cool->loc.l = g[rT].o.l A cool^loc.h = g[rT].o.h) { 

register unsigned char yy, zz\ 
octa ma, mb', 

if {g[rXX].o.l k.*ifff000Q) goto magic.done-, 
yy = g[rXX].o.l ^ 8, zz = g[rXX].o.l & ; 

if {yy > max.sys-call) goto magic-done; 

(Prepare memory arguments ma = M[a] and mb — M[&] if needed 380 }; 
switch (yy) { 

case Halt: (Either halt or print warning 373); break; 
case Fopen: g[rBB].o — mmix.fopen{zz , mb , ma)\ break; 
case Fclose: g[rBB].o — mmixJclose{zz)-, break; 
case Fread: g[rBB].o = mmix.fread{zz,mb,ma)-, break; 
case Fgets: g[rBB].o = mmix-fgets{zz,mb,ma)', break; 
case Fgetws: g[rBB].o = mmixjgetws{zz , mb , ma)-, break; 
case F write-. g[rBB\.o = mmixjwrite{zz,mb,ma)-, break; 
case Fputs: g[rBB].o = mmixjputs{zz , g[rBB].o); break; 
case Fputws-. g[rBB].o — mmix.fputws{zz , g[rBB].o)-, break; 
case Fseek: g[rBB].o = mmix-fseek{zz , g[rBB].o)-, break; 
case FtelT. g[rBB].o = mmix.fteU(zz); break; 

} 

magic.done: (?[255].o = -neg^one; /* this will enable interrupts */ 

} 

This code is used in section 322. 

373. (Either halt or print warning 373 ) = 
if ( -'zz) halted = true-, 

else if {zz = 1) { 
octa trap Joe-, 

trapjoc = incr{g[rWW].o, —A)-, 
if {^{trapJoc.hV trapjoc.l > *10)) 

print Jrip. warning {trap Joe . I 4, incr{g[rW].o, —4)); 

} 

This code is used in section 372. 

374. (Global variables 20 } += 

char arg^count[] = {1,3, 1,3, 3, 3, 3, 2, 2, 2, 1}; 

375. The input/output operations invoked by TRAPS are done by subroutines in an 
auxiliary program module called MMIX-IO. Here we need only declare those subrou- 
tines, and write three primitive interfaces on which they depend. 

376. (Global variables 20 ) -l-= 

extern octa mmix.fopen ARCS ((unsigned char , octa, octa)); 
extern octa mmix-fclose ARCS ((unsigned char)); 
extern octa mmix-fread ARGS ((unsigned char , octa, octa)); 
extern octa mmix-fgets ARGS ((unsigned char, octa, octa)); 
extern octa mmix-fgetws ARGS ((unsigned char, octa, octa)); 
extern octa mmix-fwrite ARGS ((unsigned char, octa, octa)); 
extern octa mmix-fputs ARGS ((unsigned char, octa)); 
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extern octa mmixjputws ARCS ((unsigned char,octa)); 
extern octa mmixjseek ARGS ((unsigned char, octa)); 
extern octa mmix.ftell ARGS ((unsigned char)); 
extern void print.trip. warning ARGS ((int, octa)); 

377 . ( Internal prototypes 13)+= 

int mmgetchars ARGS((char *, int, octa, int)); 

void mmputchars ARGS ((unsigned char int, octa)); 

char stdin.chr ARGS ((void)); 

octa magic.read ARGS((octa)); 

void magic-write ARGS((octa, octa)); 



ARGS = macro, §6. 
cool: control *, §60. 

Fclose = 2, §371. 

Fgets=A, §371. 

Fgetws = 5, §371. 

Fopen = 1, §371. 

Fputs = 7, §371. 

Fputws = 8, §371. 

FYead = 3, §371. 

Fseek = 9, §371. 

Ftell = 10, §371. 

Fwrite = 6, §371. 
g: specnode [], §86. 
h: tetra, §17. 

Halt=0, §371. 

halted: bool, §12. 

incr: octa (), mmix-ARITH §6. 



1 : tetra, §17. 
loc: octa, §44. 
max.sy s.call = ma.cYO, §371. 
mmixjclose: octa (), 
MMIX-IO §11. 
mmixjgets: octa (), 
MMIX-IO §14. 
mmixjgetws: octa (), 
MMIX-IO §16. 
mmix.fopen: octa (), 
MMIX-IO §8. 
mmixjputs: octa (), 
MMIX-IO §19. 
mmix.fputws: octa (), 
MMIX-IO §20. 
mmix.fread: octa (), 
MMIX-IO §12. 



mmix.fseek: octa (), 

MMIX-IO §21. 
mmix^ftell: octa (), 

MMIX-IO §22. 
mmix.f write: octa (), 

MMIX-IO §18. 

neg^one: octa, mmix-arith §4. 
o: octa, §40. 
octa = struct, §17. 

prinUtrip.waming : void (), 
MMIX-IO §23. 
rBB = 7, §52. 
rT = 13, §52. 
rW =24., §52. 
rWW =2S, §52. 
rXX = 29, §52. 
true = 1, §11. 
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378. We need to cut through all the complications of buffers and caches in order to 
do magical I/O. The magic-read routine finds the current octabyte in a given physical 
address by looking at the write buffer, D-cache, S-cache, and memory until finding it. 

( Subroutines 14 ) += 

octa magic jread[addr) 
octa addr; 

{ 

register write_node *q; 
register cacheblock 

for {q = write.tail ; ; ) { 
if {q = writejiead) break; 
if {q = wbuf-top) q = wbuf.bot', else g++; 

if {{q^addr.l & —8) = (addr. I & —8) A q-’addr.h = addr.h) return q^o\ 

} 

if [D cache) { 

p = cache_search{D cache, addr)\ 

if (p) return pr*data[{addr .1 & {Dcache-bb — 1)) 3]; 

if {{{Dcache->outbuf .tag .1 © addr.l) & —Dcache~>bb) = 0 A Dcache-^outbuf .tag .h = 
addr.h) return Dcache-outbuf .data\{addr .1 & {Dcache-bb — 1)) 3]; 

if (Scache) { 

p = cache.search [Scache, addr); 

if (p) return p-data[{addr .1 & {Scache^bb — 1)) 3]; 

if {{{Scache^outbuf .tag .1 © addr.l) & —Scache-‘bb) = 0 A S caches outbuf .tag .h = 
addr.h) return Scache-^outbuf .data[(addr .1 & {Scache-^bb — 1)) 3> 3]; 

} 

} 

return mem jread {addr)-, 

} 

379. The magic-write routine changes the octabyte in a given physical address by 
changing it wherever it appears in a buffer or cache. Any “dirty” or “least recently 
used” status remains unchanged. (Yes, this is magic.) 

( Subroutines 14 ) += 

void magicjwrite{addr ,val) 
octa addr, val; 

{ 

register write_node *q-, 
register cacheblock *p; 

for (q = write-tail-, ; ) { 
if {q = write-head) break; 
if {q = wbuf-top) q = wbuf-bot-, else g++; 

if {{cp'addr.l & —8) = {addr.l & —8) A q-^addr.h = addr.h) q-^o = val; 

} 

if {D cache) { 

p = cache-search {D cache , addr); 

if (p) pr*data[{addr .1 & {Dcache^bb — 1)) 3] = val; 

if {{{Dcache-*inbuf .tag .1® addr .l)&i—Dcache-^bb) = 0 A Dcache-^inbuf .tag .h = addr.h) 
Dcache^inbuf .data\{addr .1 & {Dcache^bb — 1)) S> 3] = val; 
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if {{{Dcache-'outbuf .tag .1 © addr.l) & —Dcache~‘bb) = 0 A Dcache-'outbuf .tag .h = 
addr.h) Dcache-'outbuf .data[{addr . I & {Dcache-bb — 1)) 3] = vaU, 

if (Scache) { 

p = cache.search{Scache , addr); 

if (p) p^data[{addr .1 & [Scache-'bb — 1)) S> 3] = val\ 

if {{{Scache-'inbuf .tag .l(Baddr .l)&c— Scache-'bb) = OAScache-'inbuf .tag .h = addr.h) 
Scache-'inbuf .data[{addr .1 & {Scache-'bb — 1)) S> 3] = val; 
if {{{Scache-outbuf .tag .1 © addr.l) & —Scache-bb) = 0 A S cache-' outbuf .tag .h = 
addr.h) Scache-outbuf .data[{addr . I & {Scache-bb — 1)) 3] = val; 

} 

} 

mem.write ( addr , val ) ; 

} 

380. The conventions of our imaginary operating system require us to apply the 
trivial memory mapping in which segment i appears in a 2^^-byte page of physical 
addresses starting at 2^‘^i. 

(Prepare memory arguments ma — M[a] and mb = M[fo] if needed 380 ) = 
if {arg. count [yy] = 3) { 
octa argJoc; 

argjoc = g[rBB].o; 

if {argJoc.h Sz*9fffffff ) mb = zero-octa; 
else argJoc.h ~;$>= 29, mb = magic-read {argjoc); 
argjoc = incr{g\rBB].o,S:); 
if {argJoc.h if ff if) ma = zero. octa; 

else argJoc.h 3>= 29, ma = magicjread {argjoc); 

} 

This code is used in section 372. 



addr: octa, §246. 
arg^count: char [], §374. 
bb: int, §167. 
cache.search: static 
cacheblock *(), §193. 
cacheblock = struct, §167. 
data: octa *, §167. 

Dcache: cache *, §168. 
g: specnode [], §86. 
h: tetra, §17. 
inbuf: cacheblock, §167. 
incr: octa (), mmix-ARITH §6. 



1: tetra, §17. 
ma: octa, §372. 
mb: octa, §372. 
mem.read: octa (), §210. 
mem.write: void (), §213. 
o: octa, §246. 
o: octa, §40. 
octa = struct, §17. 
outbuf: cacheblock, §167. 
rBB = 7, §52. 

Scache: cache *, §168. 
tag: octa, §167. 



wbuf.bot: write.node *, §247. 
wbuf.top: write.node *, §247. 
writeJiead: write.node *, 
§247. 

write.node = struct, §246. 
write.tail: write.node *, 
§247. 

yy: register unsigned char, 

§372. 

zero.octa: octa, 

MMIX-ARITH §4. 
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381. The subroutine mmgetchars {buf , size, addr, stop) reads characters starting at 
address addr in the simulated memory and stores them in buf , continuing until size 
characters have been read or some other stopping criterion has been met. If stop < 0 
there is no other criterion; if stop = 0 a null character will also terminate the process; 
otherwise addr is even, and two consecutive null bytes starting at an even address will 
terminate the process. The number of bytes read and stored, exclusive of terminating 
nulls, is returned. 

( Subroutines 14 } += 

int mmgetchars {buf , size, addr, stop) 
char *buf\ 
int size; 
octa addr; 
int stop; 

{ 

register char *p; 
register int m; 
octa a, x; 

if {{{addr.h & ) V {incr{addr, size — l).h & )) A size) { 

fprintf {stderr , "Attemptutougetucharactersufromuof futheupage ! \n" ); 
return 0; 

} 

for {p = buf , m — 0, a — addr,a.h S>= 29; m < size; ) { 

X = magicjread{a); 

if {{ad & *1) V m > size — 8) (Read and store one byte; return if done 382} 
else ( Read and store up to eight bytes; return if done 383 ) 

} 

return size; 

} 

382. (Read and store one byte; return if done 382) = 

{ 

if {ad & *4) *p = {xd S> (8 * {{^ad) & *3))) & *ff ; 
else *p = {x.h (8 * ((~a.l) & *3))) & *ff ; 
if {-i*p A stop > 0) { 

if {stop = 0) return m; 

if {{ad & *1) A *{p — 1) = ’\0’ ) return m — 1; 

} 

P++, m++,a = incr{a, 1); 

} 

This code is used in section 381. 

383. ( Read and store up to eight bytes; return if done 383 ) = 

{ 

*p = x.h ^ 24; 

if (-i*pA {stop = 0 V {stop > OAx.h < *10000))) return m; 

*{p + 1) = {x.h S> 16) & *ff ; 

if {-'*{p + 1) A stop = 0) return m + 1; 

*{p + 2) = {x.h S> 8) & *ff ; 

if {-i*{p + 2) A {stop = 0 V {stop > 0 A {x.h & *ffff ) = 0))) return m + 2; 
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*{p + 3) = x.h & *ff ; 

if (^*(p + 3) A stop = 0) return m + 3; 

*{p + 4) = x.l S> 24; 

if (^*(p + 4) A {stop = 0 V {stop > 0 A x.l < *10000))) return m + 4; 

*{p + 5) = {x.l 3> 16) & *ff ; 

if {^*{p + 5) A stop = 0) return m + 5; 

*{p + 6) = {x.l 3> 8) & ; 

if (-i*(p + 6) A {stop = 0 V {stop > 0 A {x.l & f f f ) = 0))) return m + 6; 

*{p + 7) = x.l & *ff ; 

if (^*(p + 7) A stop = 0) return m + 7; 

p +— 8, m += 8,a = incr{a, 8); 

} 

This code is used in section 381. 

384. The subroutine mmputchars {buf , size, addr) puts size characters into the sim- 
ulated memory starting at address addr. 

{ Subroutines 14 ) -|-= 

void mmputchars { buf , size , addr ) 
unsigned char *buf; 
int size; 
octa addr; 

{ 

register unsigned char *p; 
register int m; 
octa a, x; 

if {{{addr.h & ) V {incr{addr, size — l)./i & )) A size) { 

fprintf {stderr , "AttemptutOuputucharactersuof futheupage ! \n" ); 

return; 

} 

for {p = buf ,m = 0, a = addr,a.h ~;$>= 29; m < size; ) { 

if {{a.l & *7) V m > size — 8) (Load and write one byte 385) 
else (Load and write eight bytes 386); 

} 

} 

385. (Load and write one byte 385) = 

{ 

register int s = 8 * ((^aJ) & ^3); 

X = magicjread{a)\ 

if {a.l & ^4) x.l 0= (((x./ ^ s) 0 *p) & ^ s; 

else x.h 0= {{{x.h ^ s) 0 & ^ff) s; 

magic.write {a^x)\ 

P++, m++, a = incr{a, 1); 

} 

This code is used in section 384. 



fprintf: int (), <stdio.h>. 
h: tetra, §17. 

incr: octa (), mmix-ARITH §6. 



1: tetra, §17. 

magicjread: octa (), §378. 
magic jinrite: void (), §379. 



octa = struct, §17. 
stderr: FILE <stdio.h>. 
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386 . (Load and write eight bytes 386) = 

{ 

x.h = {*p < 24) + (*(p + 1) <S 16) + (*(p + 2) < 8) + *(p + 3); 
x.l = {*{p + 4) < 24) + [*{p + 5) < 16) + (*(p + 6) <S 8) + *{p + 7); 
magic^write{a, x); 
p += 8, m += 8,a = incr{a, 8); 

} 

This code is used in section 384. 

387 . When standard input is being read by the simulated program at the same 
time as it is being used for interaction, we try to keep the two uses separate by 
maintaining a private buffer for the simulated program’s Stdin. Online input is 
usually transmitted from the keyboard to a C program a line at a time; therefore 
an /gets operation works much better than fread when we prompt for new input. 
But there is a slight complication, because fgets might read a null character before 
coming to a newline character. We cannot deduce the number of characters read by 
fgets simply by looking at strlen(stdin_buf). 

{ Subroutines 14 ) += 
char stdin.chr ( ) 

{ 

register char *p; 

while {stdin Jiuf^start = stdin Jmf^end) { 
print/ ("Stdln>u" ); fflush{stdout)-, 
fgets {stdin J)uf , 256, stdin)-, 
stdinj)uf-start = stdinJmf-, 
for (p = stdin.buf; p < stdinjyuf + 254; p++) 
if (*p = ’\n’) break; 
stdinJmf^end = p + 1 ; 

} 

return * stdin Jmf^start++-, 

} 

388 . (Global variables 20 ) += 

char stdin.buf [256] ; / * standard input to the simulated program * / 

char * stdin J)uf -Start-, /* current position in that buffer */ 

char *stdin-buf-end-, /* current end of that buffer */ 
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389. Names of the sections. 

( Allocate a slot p in the S-cache 218 ) Used in section 217 . 

( Assign a functional unit if available, otherwise goto stall 82 ) Used in section 75. 
(Begin an interruption and break 317) Used in section 146. 

( Begin execution of a stage-two operation 351 ) Used in section 135. 

( Begin execution of an operation 132 ) Used in section 130. 

( Begin fetch with known physical address 296 ) Used in section 288. 

( Begin fetch without I-cache lookup 295 ) Used in section 291. 

( Cases 0 through 4, for the D-cache 233 ) Used in section 232. 

( Cases 5 through 9, for the S-cache 234) Used in section 232. 

(Cases for control of special coroutines 126, 215, 217, 222, 224, 232, 237, 257) Used in 
section 125. 

( Cases for stage 1 execution 155, 313, 325, 327, 32s, 329, 331, 356 ) Used in section 132. 

( Cases to compute the results of register-to-register operation 137, 138, 139, i4o, i4i, 

142, 143, 343, 344, 345, 346, 348, 350 ) Used in section 132. 

( Cases to compute the virtual address of a memory operation 265 ) Used in sec- 
tion 132. 

( Check for a hit in pending writes 278 ) Used in section 273. 

( Check for external interrupt 314) Used in section 64. 

( Check for prest with a fully spanned cache block 275 ) Used in section 274. 

( Check for security violation, break if so 149 ) Used in section 67. 

(Check for sufficient rename registers and memory slots, or goto stall 111) Used 
in section 75. 

( Check the protection bits and get the physical address 269 ) Used in sections 268, 
270, and 272. 

( Clean the D-cache block for data-z.o, if any 366) Used in section 364. 

( Clean the data caches 361 ) Used in section 356. 

( Clean the I-cache block for data-^z.o, if any 365 ) Used in section 364. 

( Clean the S-cache block for data-z.o, if any 367) Used in section 364. 

( Commit and/or deissue up to commit-max instructions 67) Used in section 64. 

( Commit the hottest instruction, or break if it’s not ready 146 ) Used in section 67. 
( Commit to memory if possible, otherwise break 256 ) Used in section 146. 

( Compute the new entry for c-inbuf and give the caller a sneak preview 245 ) Used 
in section 237. 

( Continue this command on the next cache block 369 ) Used in section 364. 

( Convert relative address to absolute address 84) Used in section 75. 

( Copy data from p into C-inbuf 226 ) Used in section 224. 

( Copy Scache-'inbuf to slot p 220 ) Used in section 217. 

( Copy the data from block q to fetched 294) Used in sections 292 and 296. 



a: octa, §384. 
fflush: int (), <stdio.h>. 
fgets: char *(), <stdio.h>. 
fread: size_t (), <stdio.h>. 
h: tetra, §17. 

incr\ octa (), mmix-ARITH §6. 



1: tetra, §17. 
m: register int, §384. 
magic jinrite: void (), §379. 
p\ register unsigned char *, 
§384. 



print/: int (), <stdio.h>. 
stdin: FILE *, <stdio.h>. 
stdout: FILE <stdio.h>. 
strlen: size.t (), <string.h>. 
x: octa, §384. 
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( Declare mmix_opcode and internaLopcode 47 , 49 ) Used in section 44 . 

( Deissue all but the hottest command 316 ) Used in section 314 . 

( Deissue the coolest instruction 145 ) Used in section 67 . 

( Determine the flags, /, and the internal opcode, i so) Used in section 75 . 

( Dispatch an instruction to the cool block if possible, otherwise goto stall 101 ) 
Used in section 75 . 

( Dispatch one cycle’s worth of instructions 74 ) Used in section 64 . 

( Do a simultaneous lookup in the D-cache 268 ) Used in section 267 . 

( Do a simultaneous lookup in the I-cache 292 ) Used in section 291 . 

(Do load/store stage 1 without D-cache lookup 270) Used in section 267 . 

( Do load/store stage 1 with known physical address 271 ) Used in section 266 . 

(Do load/store stage 2 without D-cache lookup 277) Used in section 273. 

( Do stage 1 of LDVTS 353 ) Used in section 352 . 

( Do the final SAVE 340 ) Used in section 339 . 

( Either halt or print warning 373 ) Used in section 372 . 

( Execute all coroutines scheduled for the current time 125 ) Used in section 64 . 

( External prototypes 9, 38 , I6I, 175, 178 , I8O, 209, 212, 252 ) Used in sections 3 and 5 . 
(External routines 10. 39, I62, 176 , 179, I8I, 210, 213, 253) Used in section 3 . 

( External variables 4 , 29, 59, 60, 66 , 69 , 77 , 86 , 87 , 98 , 115, 136 , 150, 168, 207, 211, 214, 242, 247, 
284 , 349 ) Used in sections 3 and 5 . 

(Fill Scache-'inbuf with clean memory data 219) Used in section 217 . 

(Finish a CSWAP 283) Used in section 281 . 

( Finish a store command 281 ) Used in section 280 . 

( Finish execution of an operation 144) Used in section 130 . 

( Forward the new data past the D-cache if it is write-through 263 ) Used in sec- 
tion 257 . 

( Generate an instruction to save g[yy] 339) Used in section 337 . 

( Generate an instruction to unsave g[yy] 333) Used in section 332 . 

( Get ready for the next step of PREGO 229) Used in section 81 . 

( Get ready for the next step of PRELD or PREST 228 ) Used in section 81 . 

( Get ready for the next step of SAVE 341 ) Used in section 81 . 

( Get ready for the next step of UNSAVE 335) Used in section 81 . 

( Global variables 20, 36 , 41, 48 , so, 51, 53, 54, 6 S, 70, 78 , 83 , 88 , 99, 107, 127, 148 , 154, 194, 230, 

235 , 238 , 248 , 285 , 303 , 305 , 315 , 374 , 376 , 388 ) Used in section 3 . 

( Handle an internal SAVE when it’s time to store 342 ) Used in section 281 . 

( Handle an internal UNSAVE when it’s time to load 336 ) Used in section 279 . 
(Handle interrupt at end of execution stage 307) Used in section 144 . 

( Handle special cases for operations like prego and Idvts 289 , 352 ) Used in section 266 . 
( Handle write-around when flushing to the S-cache 221 ) Used in section 217 . 

( Handle write-around when writing to the D-cache 259 ) Used in section 257 . 

( Header definitions 6 , 7 , 8 , 52, 57, 129, I66 ) Used in sections 3 and 5 . 

( Ignore the item in writeJiead 264 ) Used in section 257 . 

(Initialize everything 22, 26 , 6I, 71, 79, 89 , II6, 128, 153, 231, 236 , 249, 286 ) Used in sec- 
tion 10 . 

(Insert an instruction to advance beta and L 112) Used in section lio. 
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(Insert an instruction to advance gamma 113) Used in sections 110, 119, and 337. 
(Insert an instruction to decrease gamma 114) Used in section 120. 

( Insert data^b.o into the proper field of data-x.o^ checking for arithmetic exceptions 
if signed 282 ) Used in section 281. 

( Insert dummy instruction for page table emulation 302 ) Used in section 298. 

( Insert special operands when resuming an interrupted operation 324 ) Used in 
section 103. 

(Install a new instruction into the tail position 304) Used in section 301. 

( Install default fields in the cool block 100 ) Used in section 75. 

( Install register X as the destination, or insert an internal command and goto 
dispatch.done if X is marginal no) Used in section 101. 

(Install the operand fields of the cool block 103 ) Used in section 101. 

(Internal prototypes 13, 18, 24, 27, 30, 32, 34, 42, 45, 55, 62, 72, 90, 92, 94, 96, 156, 158, 169, 171, 

173, 182, 184, 186, 188, 190, 192, 195, 198, 200, 202, 204, 240, 250, 254, 37?) Used in section 3. 

( Issue j pseudo-instructions to compute a page table entry 244 ) Used in section 243. 
( Issue the cool instruction si ) Used in section 75. 

( Load and write eight bytes 386 ) Used in section 384. 

( Load and write one byte 385 ) Used in section 384. 

( Local variables 12 , 124, 258 ) Used in section 10. 

( Look at the head instruction, and try to dispatch it if j < dispatch^max 75 ) Used 
in section 74. 

( Look up the address in the DT-cache, and also in the D-cache if possible 267 ) 
Used in section 266. 

( Look up the address in the IT-cache, and also in the I-cache if possible 291 ) Used 

in section 288. 

(Magically do an I/O operation, if cool-loc is rT 372) Used in section 322. 

(Make sure cooLL and cooLG are up to date 102 ) Used in section 101. 

(Nullify the hottest instruction 147) Used in section 146. 

( Other cases for the fetch coroutine 298, 301 ) Used in section 288. 

( Pass data to the next stage of the pipeline 134 ) Used in section 130. 

( Perform one cycle of the interrupt preparations 318 ) Used in section 64. 

( Perform one machine cycle 64 ) Used in section 10. 

( Predict a branch outcome 151 ) Used in section 85. 

( Prepare for exceptional trip handler 308 ) Used in section 307. 

(Prepare memory arguments ma = M[a] and mb = M[6] if needed 38o) Used in 
section 372. 

( Prepare to emulate the page translation 309 ) Used in section 310. 

(Print all of c’s cache blocks 177) Used in section 176. 

( Read and store one byte; return if done 382 ) Used in section 381. 

(Read and store up to eight bytes; return if done 383) Used in section 381. 

( Read data into c-'inbuf and wait for the bus 223 ) Used in section 222. 

( Read from memory into fetched 297 ) Used in section 296. 

( Record the result of branch prediction 152 ) Used in section 75. 

( Recover from incorrect branch prediction 16O ) Used in section 155. 

( Redirect the fetch if control changes at this inst 85 ) Used in section 75. 

(Restart the fetch coroutine 287) Used in sections 85, 160, 308, 309, and 316. 

( Resume an interrupted operation 323 ) Used in section 322. 
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( Set cool-'b and/or cool-ra from special register los) Used in section 103. 

( Set cool-‘b from register X loe ) Used in section 103. 

( Set cool-^ from register Y lOS ) Used in section 103. 

( Set cool-’z as an immediate wyde 109 ) Used in section 103. 

( Set cool^z from register Z lOl) Used in section 103. 

( Set resumption registers (rB, $ 255 ) or (rBB, $ 255 ) 319) Used in section 318. 

( Set resumption registers (rW, rX) or (rWW, rXX) 320 ) Used in section 318. 

( Set resumption registers (rY, rZ) or (rYY, rZZ) 321 ) Used in section 318. 

( Set things up so that the results become known when they should 133 ) Used in 
section 132. 

( Set up the first phase of saving 338 ) Used in section 337. 

( Set up the first phase of unsaving 334 ) Used in section 332. 

( Simulate an action of the fetch coroutine 288 ) Used in section 125. 

( Simulate later stages of an execution pipeline 135 ) Used in section 125. 

( Simulate the first stage of an execution pipeline 130 ) Used in section 125. 

(Special cases for states in later stages 272, 273. 276 , 279, 2so, 299, 311, 354, 364 , 370) 
Used in section 135. 

( Special cases for states in the hrst stage 266, 310, 326, 360, 363 ) Used in section 130. 

( Special cases of instruction dispatch 117, 118, 119, 120, 121, 122, 227, 312, 322, 332, 337, 
347, 355 ) Used in section 101. 

( Start the S-Cache filler 225 ) Used in section 224. 

( Start up auxiliary coroutines to compute the page table entry 243 ) Used in sec- 
tion 237. 

( Subroutines 14. 19, 21, 25, 28, 31, 33, 35, 43, 46, 56, 63, 73, 91, 93, 95, 97, 157, 159, 170, 172, 174, 

183, 185, 187, 189, 191, 193, 196, 199, 201, 203, 205, 208, 241, 251, 255, 378, 379, 381, 384, 387) 
Used in section 3. 

( Swap cache blocks p and q 197 ) Used in sections 196 and 205 . 

( Try to get the contents of location data^z.o in the D-cache 274 ) Used in section 273. 
( Try to get the contents of location data-‘z.o in the Tcache 300 ) Used in section 298. 
( Try to put the contents of location write Jiead^addr into the D-cache 261 ) Used 
in section 257. 

(Type definitions ll, 17, 23. 37, 40, 44, 68, 76, 164, 167, 206, 246, 371) Used in sections 3 
and 5. 

( Undo data structures set prematurely in the cool block and break 123 ) Used in 
section 75. 

( Update IT-cache usage and check the protection bits 293 ) Used in sections 292 
and 295. 

( Update rG 330 ) Used in section 329. 

( Update the page variables 239 ) Used in section 329. 

(Use cleanup on the cache blocks for data-z.o^ if any 368 ) Used in section 364. 

( Wait for input data if necessary; set state = 1 if it’s there 131 ) Used in section 130. 
( Wait if there’s an unfinished load ahead of us 357 ) Used in section 356. 

( Wait till write buffer is empty 362 ) Used in sections 361 and 364. 

(Wait, if necessary, until the instruction pointer is known 290) Used in section 288. 
( Write directly from write-head to memory 26O ) Used in section 257. 
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( Write the data into the D-cache and set state = 4, if there’s a cache hit 262 ) Used 
in section 257. 

( Write the dirty data of cr^outbuf and wait for the bus 216 ) Used in section 215. 

( Zap the instruction and data caches 359 ) Used in section 356. 

( Zap the translation caches 358 ) Used in section 356. 

(mmix-pipe .h 5) 



MMIX-SIM 

1. Introduction. This program simulates a simplified version of the MMIX com- 
puter. Its main goal is to help people create and test MMIX programs for The Art 
of Computer Programming and related publications. It provides only a rudimen- 
tary terminal-oriented interface, but it has enough infrastructure to support a cool 
graphical user interface — which could be added by a motivated reader. (Hint, hint.) 

MMIX is simplified in the following ways: 

• There is no pipeline, and there are no caches. Thus, commands like SYNC and SYNCD 
and PREGO do nothing. 

• Simulation applies only to user programs, not to an operating system kernel. Thus, 
all addresses must be nonnegative; “privileged” commands such as PUT rK,z or 
RESUME 1 or LDVTS x,y,z are not allowed; instructions should be executed only 
from addresses in segment 0 (addresses less than ’^2000000000000000). Certain 
special registers remain constant: rF = 0, rK = ’*‘f f f f f f f f f f f f f f f f, rQ = 0; 
rT = *‘8000000500000000, rTT = ’*8000000600000000, rV = ’*369c200400000000. 

• No trap interrupts are implemented, except for a few special cases of TRAP that 
provide rudimentary input-output. 

• All instructions take a fixed amount of time, given by the rough estimates stated in 
the MMIX documentation. For example, MUL takes lOu, LDB takes p + v, all times are 
expressed in terms of p and v, “meins” and “oops.” The simulated clock increases by 
2^^ for each p and 1 for each v. But the interval counter rl decreases by 1 for each v; 
and the usage count field of rlJ may increase by 1 (modulo 2^^) for each instruction. 

2. To run this simulator, assuming UNIX conventions, you say ‘mmix (options) 
progf ile args . . . where progf ile is an output of the MMIXAL assembler, args . . . 
is a sequence of optional command line arguments passed to the simulated program, 
and ( options ) is any subset of the following: 

• -t<n> Trace each instruction the first n times it is executed. (The notation <n> 
in this option, and in several other options and interactive commands below, stands 
for a decimal integer.) 

• -e<x> Trace each instruction that raises an arithmetic exception belonging to the 
given bit pattern. (The notation <x> in this option, and in several other commands 
below, stands for a hexadecimal integer.) The exception bits are DVWIOUZX as they 
appear in rA, namely ’*80 for D (integer divide check), ’*40 for V (integer overflow), 
. . . , ’*01 for X (floating inexact). The option -e by itself is equivalent to -eff , tracing 
all eight exceptions. 

• -r Trace details of the register stack. This option shows all the “hidden” loads 
and stores that occur when octabytes are written from the ring of local registers into 
memory, or read from memory into that ring. It also shows the full details of SAVE 
and UNSAVE operations. 

• -Kn> List the source line corresponding to each traced instruction, filling gaps 
of length n or less. For example, if one instruction came from line 10 of the source 
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file and the next instruction to be traced came from line 12, line 11 would be shown 
also, provided that n> 1. If <n> is omitted it is assumed to be 3. 

• -s Show statistics of running time with each traced instruction. 

• -P Show the program profile (that is, the frequency counts of each instruction 
that was executed) when the simulation ends. 

• -L<n> List the source lines corresponding to each instruction that appears in the 
program profile, filling gaps of length n or less. This option implies -P. If <n> is 
omitted it is assumed to be 3. 

• -V Be verbose: Turn on all options. (More precisely, the -v option is shorthand 
for -t9999999999 -e -r -s -110 -LIO.) 

• -q Be quiet: Cancel all previously specihed options. 

• -i Go into interactive mode before starting the simulation. 

• -I Go into interactive mode when the simulated program halts or pauses for a 

breakpoint. 

• -b<n> Set the buffer size of source lines to max(72,n). 

• -c<n> Set the capacity of the local register ring to max(256, n); this number must 

be a power of 2. 

• -f<filename> Use the named file for standard input to the simulated program. 
This option should be used whenever the simulator is not being used interactively, 
because the simulator will not recognize end of file when standard input has been 
defined in any other way. 

• -D<filename> Prepare the named file for use by other simulators, instead of 
actually doing a simulation. 

• -? Print the “Usage” message, which summarizes the command line options. 

The author recommends -t2 -1 -L for initial offline debugging. 

While the program is being simulated, an interrupt signal (usually control-C) will 
cause the simulator to break and go into interactive mode after tracing the current 
instruction, even if -i and -I were not specified on the command line. 
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3. In interactive mode, the user is prompted ‘mmix>’ and a variety of commands 
can be typed online. Any command line option can be given in response to such a 
prompt (including the that begins the option), and the following operations are 
also available: 

• Simply typing (return) or n( return) to the mmix> prompt causes one MMIX in- 
struction to be executed and traced; then the user is prompted again. 

• c continues simulation until the program halts or reaches a breakpoint. (Actually 
the command is ‘c( return)’, but we won’t bother to mention the (return) in the 
following description.) 

• q quits (terminates the simulation), after printing the profile (if it was requested) 
and the final statistics. 

• s prints out the current statistics (the clock times and the current instruction 
location). We have already discussed the -s option on the command line, which 
causes these statistics to be printed automatically; but a lot of statistics can hll up a 
lot of file space, so users may prefer to see the statistics only on demand. 

• l<nxt>, g<nxt>, $<nxt>, rA<t>, rB<t>, ..., rZZ<t>, and M<xXt> will show 
the current value of a local register, global register, dynamically numbered register, 
special register, or memory location. Here <t> specihes the type of value to be 
displayed; if <t> is ‘ ! ’, the value will be given in decimal notation; if <t> is ‘ . ’ it 
will be given in floating point notation; if <t> is ‘#’ it will be given in hexadecimal, 
and if <t> is it will be given as a string of eight one-byte characters. Just typing 
<t> by itself will repeat the most recently shown value, perhaps in another format; 
for example, the command ‘110#’ will show local register 10 in hexadecimal notation, 
then the command ‘ ! ’ will show it in decimal and ‘ . ’ will show it as a floating point 
number. If <t> is empty, the previous type will be repeated; the default type is 
decimal. Register rA is equivalent to g22, according to the numbering used in GET 
and PUT commands. 

The ‘<t>’ in any of these commands can also have the form ‘=<value>’, where 
the value is a decimal or floating point or hexadecimal or string constant. (The 
syntax rules for floating point constants appear in MMIX-ARITH. A string constant 
is treated as in the BYTE command of MMIXAL, but padded at the left with zeros if 
fewer than eight characters are specified.) This assigns a new value before displaying 
it. For example, ‘110=. Ie3’ sets local register 10 equal to 100; ‘g250="ABCD" ,#a’ 
sets global register 250 equal to *000000414243440a; ‘M1000=-Inf ’ sets Mg[*1000] = 
*fff 0000000000000, the representation of — oo. Special registers other than rl cannot 
be set to values disallowed by PUT. Marginal registers cannot be set to nonzero values. 

The command ‘rl=250’ sets the interval counter to 250; this will cause a break in 
simulation after 250u have elapsed. 

• +<nxt> shows the next n octabytes following the one most recently shown, in 
format <t>. For example, after ‘110#’ a subsequent ‘+30’ will show 111, 112, ..., 
140 in hexadecimal notation. After ‘g200=3’ a subsequent ‘+30’ will set g201, g202, 
. . . , g230 equal to 3, but a subsequent ‘+30 ! ’ would merely display g201 through g230 
in decimal notation. Memory addresses will advance by 8 instead of by 1. If <n> is 
empty, the default value n = 1 is used. 
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• @<x> sets the address of the next tetrabyte to be simulated, sort of like a GO 
command. 

• t<x> says that the instruction in tetrabyte location x should always be traced, 
regardless of its frequency count. 

• u<x> undoes the effect of t<x>. 

• b [rwx] <x> sets breakpoints at tetrabyte x; here [rwx] stands for any subset of the 
letters r, w, and/or x, meaning to break when the tetrabyte is read, written, and/or 
executed. For example, ‘bxlOOO’ causes a break in the simulation just after the 
tetrabyte in *1000 is executed; ‘blOOO’ undoes this breakpoint; ‘brwxlOOO’ causes 
a break just after any simulated instruction loads, stores, or appears in tetrabyte 
number *1000. 

• T, D, P, S changes the “current segment” to either Text_Segment, Data_Segment, 
Pool_Segment, or Stack_Segment, respectively, namely to *0, *2000000000000000, 
*4000000000000000, or *6000000000000000. The current segment, initially *0, is 
added to all memory addresses in M, @, t, u, and b commands. 

• B lists all current breakpoints and tracepoints. 

• i<filename> reads a sequence of interactive commands from the specihed file, 
one command per line, ignoring blank lines. This feature can be used to set many 
breakpoints or to display a number of key registers, etc. Included lines that begin with 
"/ or i are ignored; therefore an included file cannot include another file. Included 
lines that begin with a blank space are reproduced in the standard output, otherwise 
ignored. 

• h (help) reminds the user of the available interactive commands. 
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4. Rudimentary I/O. Input and output are provided by the following ten prim- 
itive system calls: 

• Fopen(/iondZe , name, mode). Here handle is a one-byte integer, name is the address 
of the first byte of a string, and mode is one of the values TextRead, TextWrite, 
BinaryRead, BinaryWrite, BinaryReadWrite. An Fopen call associates handle with 
the external file called name and prepares to do input and/or output on that file. 
It returns 0 if the file was opened successfully; otherwise returns the value —I. If 
mode is TextWrite, BinaryWrite, or BinaryReadWrite, any previous contents of 
the named file are discarded. If mode is TextRead or TextWrite, the file consists of 
“lines” terminated by “newline” characters, and it is said to be a text file; otherwise 
the file consists of uninterpreted bytes, and it is said to be a binary file. 

Text files and binary files are essentially equivalent in cases where this simulator is 
hosted by an operating system derived from UNIX; in such cases files can be written 
as text and read as binary or vice versa. But with other operating systems, text files 
and binary files often have quite different representations, and certain characters with 
byte codes less than ’ □ ’ are forbidden in text. Within any MMIX program, the newline 
character has byte code *0a = 10. 

At the beginning of a program three handles have already been opened: The 
“standard input” file Stdin (handle 0) has mode TextRead, the “standard output” 
file StdOut (handle 1) has mode TextWrite, and the “standard error” file StdErr 
(handle 2) also has mode TextWrite. When this simulator is being run interactively, 
lines of standard input should be typed following a prompt that says ‘Stdln>u’, unless 
the -f option has been used. The standard output and standard error files of the 
simulated program are intermixed with the output of the simulator itself. 

The input /output operations supported by this simulator can perhaps be under- 
stood most easily with reference to the standard library stdio that comes with the 
C language, because the conventions of C have been explained in hundreds of books. 
If we declare an array FILE */i/e[256] and set file[0\ = stdin, file[l] = stdout, and 
/i/e[2] = stderr , then the simulated system call F open(handle , name, mode) is essen- 
tially equivalent to the C expression 

[file [handle]/ {file[handle] = freopen (name , modestring[mode], file[handle])): 

(file[handle] = fopen (name, modestring [mode])))/ 0: —1, 

if we predefine the values modestring [TextRead] = "r", mode_strmg [TextWrite] = 
"w", modestring [BinaryRead] = "rb", ?7iode_strmg [BinaryWrite] = "wb", and 
mode-Strm^ [BinaryReadWrite] = "wb+". 

• Fclose(handle). If the given file handle has been opened, it is closed — no longer 
associated with any file. Again the result is 0 if successful, or —1 if the file was already 
closed or unclosable. The C equivalent is 

f close (file[handle]) / —I : 0 

with the additional side effect of setting file[handle] = A. 
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• Fread(/ia?T,dfe, buffer, size). The file handle should have been opened with mode 
TextRead, BinaryRead, or BinaryReadWrite. The next size characters are read into 
MMIX’s memory starting at address buffer. If an error occurs, the value —1 — size is 
returned; otherwise, if the end of file does not intervene, 0 is returned; otherwise the 
negative value n — .size is returned, where n is the number of characters successfully 
read and stored. The statement 

fread{buffer, 1, size , file[handle]) — size 
has the equivalent effect in C, in the absence of file errors. 

• Fgets{handle , buffer, size). The file handle should have been opened with mode 
TextRead, BinaryRead, or BinaryReadWrite. Characters are read into MMIX’s mem- 
ory starting at address buffer, until either size — 1 characters have been read and 
stored or a newline character has been read and stored; the next byte in memory 
is then set to zero. If an error or end of file occurs before reading is complete, the 
memory contents are undefined and the value —1 is returned; otherwise the number 
of characters successfully read and stored is returned. The equivalent in C is 

/gets (buffer, size, file[handle]) 1 strlen (buffer) : —1 

if we assume that no null characters were read in; null characters may, however, 
precede a newline, and they are counted just like other characters. 

• Fget'ws (handle, buffer, size). This command is the same as Fgets, except that 
it applies to wyde characters instead of one-byte characters. Up to size — I wyde 
characters are read; a wyde newline is *000a. The C version, using conventions of 
the ISO multibyte string extension (MSE), is approximately 

fgetws(buffer,size,file[handle]) ? wcslen (buffer) : —I 
where buffer now has type wchar.t *. 

• F'WT±te(handle , buffer, size). The file handle should have been opened with one of 
the modes TextWrite, BinaryWrite, or BinaryReadWrite. The next size characters 
are written from MMIX’s memory starting at address buffer. If no error occurs, 0 is 
returned; otherwise the negative value n — size is returned, where n is the number of 
characters successfully written. The statement 

/write (buff er , I, size , file [handle]) — size 
together with /flush (file[handle]) has the equivalent effect in C. 

• Fpnts(handle , string). The file handle should have been opened with one of the 
modes TextWrite, BinaryWrite, or BinaryReadWrite. One-byte characters are 
written from MMIX’s memory to the file, starting at address string, up to but not 
including the first byte equal to zero. The number of bytes written is returned, or —1 
on error. The C version is 

/puts (string , file [handle]) > 0 ? strlen (string) : —I, 



together with /flush (file [handle]). 
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• Eputv s {handle , string). The file handle should have been opened with one of the 
modes TextWrite, BinaryWrite, or BinaryReadWrite. Wyde characters are written 
from MMIX’s memory to the file, starting at address string, up to but not including 
the first wyde equal to zero. The number of wydes written is returned, or —1 on 
error. The C+MSE version is 

fputws {string , file[handle]) > 0 ? wcslen{string) : —1 

together with ffiush{file[handle]), where string now has type wchar.t *. 

• Fsee^{handle , offset). The file handle should have been opened with one of the 
modes BinaryRead, BinaryWrite, or BinaryReadWrite. This operation causes the 
next input or output operation to begin at offset bytes from the beginning of the 
hie, if offset > 0, or at —offset — 1 bytes before the end of the hie, if offset < 0. 
(For example, offset = 0 “rewinds” the hie to its very beginning; offset = — 1 moves 
forward all the way to the end.) The result is 0 if successful, or —1 if the stated 
positioning could not be done. The C version is 

fseek{file[handle], offset < 0 ? offset + 1 : offset , 

offset < 0 ? SEEK_END : SEEK_SET)? -1: 0. 

If a hie in mode BinaryReadWrite is used for both reading and writing, an Fseek 
command must be given when switching from input to output or from output to 
input. 

• Ftell{handle). The hie handle should have been opened with mode BinaryRead, 
BinaryWrite, or BinaryReadWrite. This operation returns the current hie position, 
measured in bytes from the beginning, or —1 if an error has occurred. In this case 
the C function 

ftell {file [handle]) 

has exactly the same meaning. 

Although these ten operations are quite primitive, they provide the necessary func- 
tionality for extremely complex input/output behavior. For example, every function 
in the stdio library of C, with the exception of the two administrative operations 
remove and rename, can be implemented as a subroutine in terms of the six basic 
operations Fopen, Fclose, Fread, Fwrite, Fseek, and Ftell. 

Notice that the MMIX function calls are much more consistent than those in the 
C library. The hrst argument is always a handle; the second, if present, is always 
an address; the third, if present, is always a size. The result returned is always 
nonnegative if the operation was suceessful, negative if an anomaly arose. These 
common features make the functions reasonably easy to remember. 

5. The ten input/output operations of the previous section are invoked by TRAP 
commands with X = 0, Y = Fopen or Fclose or . . . or Ftell, and Z = Handle. 
If there are two arguments, the second argument is placed in $255. If there are 
three arguments, the address of the second is placed in $255; the second argument is 
Mg [$255] and the third argument is Mg [$255 -F 8]. The returned value will be in $255 
when the system call is finished. (See the example below.) 
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6. The user program starts at symbolic location Main. At this time the global regis- 
ters are initialized according to the GREG statements in the MMIXAL program, and $255 
is set to the numeric equivalent of Main. Local register $0 is initially set to the number 
of command line arguments', and local register $1 points to the first such argument, 
which is always a pointer to the program name. Each command line argument is a 
pointer to a string; the last such pointer is Mg[$0 ^ 3-|-$l], and Mg[$0 ^ 3-1- $1-1-8] is 
zero. (Register $1 will point to an octabyte in Pool_Segment, and the command line 
strings will be in that segment too.) Location M[Pool_Segment] will be the address 
of the first unused octabyte of the pool segment. 

Registers rA, rB, rD, rE, rF, rH, rl, rJ, rM, rP, rQ, and rR are initially zero, and 
rL = 2. 

A subroutine library loaded with the user program might need to initialize itself. If 
an instruction has been loaded into tetrabyte M 4 [*f 0 ], the simulator actually begins 
execution at *f0 instead of at Main; in this case $255 holds the location of Main. 
(The routine at *f0 can pass control to Main without increasing rL, if it starts with 
the slightly tricky sequence 

PUT rW, $255; PUT rB, $255; SETML $255,#F700; PUT rX,$255 

and eventually says RESUME; this RESUME command will restore $255 and rB. But the 
user program should not really count on the fact that rL is initially 2.) 

7. The main program ends when MMIX executes the system call TRAP 0, which is 
often symbolically written ‘TRAP 0,Halt,0’ to make its intention clear. The contents 
of $255 at that time are considered to be the value “returned” by the main program, 
as in the exit statement of C; a nonzero value indicates an anomalous exit. All open 
files are closed when the program ends. 
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8. Here, for example, is a complete program that copies a text file to the standard 
output, given the name of the file to be copied. It includes all necessary error checking. 

* SAMPLE PROGRAM; COPY A GIVEN FILE TO STANDARD OUTPUT 



t 


IS 


$255 




argc 


IS 


$0 




argv 


IS 


$1 




s 


IS 


$2 




Buf _Size 


IS 


1000 






LOG 


Data_Segment 




Buffer 


LOG 


@+Buf _Size 






GREG 


@ 




ArgO 


OGTA 


0,TextRead 




Argl 


OGTA 


Buffer , Buf _Size 






LOG 


#200 


main ( argc , argv) { 


Main 


GMP 


t , argc , 2 


if (argc==2) goto openit 




PBZ 


t , Openit 






GETA 


t,lF 


fputs ("Usage; ",stderr) 




TRAP 


0 , Fputs , StdErr 






LDOU 


t , argv , 0 


fputs (argv [0] ,stderr) 




TRAP 


0 , Fputs , StdErr 






GETA 


t,2F 


fputs (" f ilenameXn" , stderr) 


Quit 


TRAP 


0 , Fputs , StdErr 






NEG 


t,0,l 


quit; exit(-l) 




TRAP 


0 , Halt , 0 




IH 


BYTE 


"Usage; ",0 






LOG 


(@+3)&-4 


align to tetrabyte 


2H 


BYTE 


" filename " ,#a,0 




Openit 


LDOU 


s , argv , 8 


openit; s=argv[l] 




STOU 


s , ArgO 






LDA 


t , ArgO 


f open(argv [1] , "r" ,f ile [3] ) 




TRAP 


0,Fopen,3 






PBNN 


t , Gopylt 


if (no error) goto copyit 




GETA 


t,lF 


fputsC'Gan’t open file ", stderr) 




TRAP 


0 , Fputs , StdErr 






SET 


t , s 


fputs (argv [1] , stderr) 




TRAP 


0 , Fputs , StdErr 






GETA 


t,2F 


fputs ( " ! \n" , stderr) 




JMP 


Quit 


goto quit 


IH 


BYTE 


"Gan’t open file 


",0 




LOG 


(@+3)&-4 


align to tetrabyte 


2H 


BYTE 


" ! " ,#a,0 




Copylt 


LDA 


t.Argl 


copyit ; 




TRAP 


0,Fread,3 


items=fread (buffer , l,buf_size,file [3] ) 
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BN 


t ,EndIt 


if (items < buf_size) goto endit 




EDA 


t.Argl 


items=fwrite (buffer , 1 ,buf _size , stdout) 




TRAP 


0 , Fwrite , StdOut 






PBNN 


t ,CopyIt 


if (items >= buf_size) goto copyit 


Trouble 


GETA 


t,lF 


trouble: f puts ( "Trouble w stderr) 




JMP 


Quit 


goto quit 


IH 


BYTE 


"Trouble writing 


StdOut ! " ,#a,0 


End It 


INCL 


t ,Buf_Size 






BN 


t , ReadErr 


if (ferror (f ile [3] ) ) goto readerr 




STO 


t , Argl+8 






LDA 


t.Argl 


n=fwrite (buffer , 1 , items , stdout) 




TRAP 


0 , Fwrite , StdOut 






BN 


t .Trouble 


if (n < items) goto trouble 




TRAP 


0,Halt,0 


exit (0) 


ReadErr 


GETA 


t,lF 


readerr: f puts ( "Trouble r stderr) 




JMP 


Quit 


goto quit } 


IH 


BYTE 


"Trouble reading! 


!",#a,0 
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9. Basics. To get started, we define a type that provides semantic sugar. 

( Type declarations 9 ) = 

typedef enum { 

false , true 

} bool; 

See also sections 10, 16, 38, 39, 54, 55, 59, 64, and 135. 

This code is used in section 141. 

10. This program for the 64-bit MMIX architecture is based on 32-bit integer arith- 
metic, because nearly every computer available to the author at the time of writing 
(1999) was limited in that way. It uses subroutines from the MMIX-ARITH module, 
assuming only that type tetra represents unsigned 32-bit integers. The definition of 
tetra given here should be changed, if necessary, to agree with the definition in that 
module. 

( Type declarations 9 ) -|-= 

typedef unsigned int tetra; 

/* for systems conforming to the LP-64 data model */ 
typedef struct { 
tetra h, l\ 

} octa; /* two tetrabytes make one octab 3 de */ 
typedef unsigned char byte; /* a monobyte */ 

11. We declare subroutines twice, once with a prototype and once with the old- 
style C conventions. The following hack makes this work with new compilers as well 
as the old standbys. 

(Preprocessor macros ii) = 

#ifdef __STDC__ 

^define ARGS(fot) list 
^else 

^define ARCS (fot) () 

^endif 

See also sections 43 and 46. 

This code is used in section 141. 

12. (Subroutines 12 ) = 
void print.hex ARCS ((octa)); 
void print Jiex{o) 

octa o; 

{ 

if (o./i) printf {"7,x7.Q8x" , o.h,o.l); 
else printf {"7,x." , o.l)-, 

} 

See also sections 13, 15, 17, 20, 26, 27, 42, 45, 47, 50, 82, 83, 91, 114, 117, 120, 137, 140, 143, 148, 
154, 160, 162, 165, and 166. 

This code is used in section 141. 
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STDC , Standard C. 



print/: int (), <stdio.h>. 
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13. Most of the subroutines in MMIX-ARITH return an octabyte as a function of 
two octabytes; for example, oplus {y, z) returns the sum of octabytes y and z. Division 
inputs the high half of a dividend in the global variable aux and returns the remainder 
in aux. 



/* zero-octa .h = zero-octa .1 = 0 */ 

/* neg.one.h = neg.one.l = — 1 */ 

/* auxiliary data */ 

/* flag set by signed multiplication and division */ 

/* bits set by floating point operations */ 

/* the current rounding mode */ 

/* where a scanned constant ended */ 

/* unsigned y + z *j 
/* unsigned y — z */ 

/* unsigned y + 5 (<5 is signed) */ 
/* y /\ z */ 

/* y <C s, 0 < s < 64 */ 



( Subroutines 12 } += 
extern octa zero-Octa\ 
extern octa neg.one; 
extern octa aux, val; 
extern bool overflow, 
extern int exceptions', 
extern int curjround', 
extern char *next.char; 
extern octa oplus ARGS((octa y,octa z))-, 
extern octa ominus ARGS((octa i/, octa 2)); 
extern octa incr ARGS((octa y, int delta))-, 
extern octa oand ARGS((octa ?/, octa z))', 
extern octa shiftJeft ARGS((octa y, int s)); 
extern octa shift.right ARGS((octa y, int s, int u)); /* y ^ s, signed if -iu */ 

extern octa omult ARGS((octa i/, octa 2)); /* unsigned (aux,x) = y x z */ 

extern octa signed.omult ARGS((octa y, octa 2)); /* signed x = y x z */ 

extern octa odiv ARGS((octa 3;, octa y, octa 2)); 

/* unsigned {x,y)/z-, aux = (x,y) mod 2 */ 
extern octa signed.odiv ARGS((octa y,octa 2)); /* signed x = y/2 */ 

extern int counCbits ARGS((tetra 2)); j* x = v{z) */ 
extern tetra byte.diff ARGS((tetra y,tetra 2)); /* half of BDIF */ 

extern tetra wyde.diff ARGS((tetra y, tetra 2)); /* half of WDIF */ 

extern octa booLmult ARGS((octa y,octa 2, bool xor)); /* MOR or MXOR */ 
extern octa load.sf ARGS( (tetra 2)); /* load short float */ 

extern tetra store^sf ARGS((octa x)); /* store short float */ 

extern octa fplus ARGS((octa y, octa 2)); /* floating point x = y (B z */ 

extern octa fmult ARGS((octa y,octa 2)); /* floating point x = y® z */ 

extern octa fdivide ARGS((octa y, octa 2)); /* floating point x = y 0 z */ 

extern octa froot ARGS( (octa, int)); /* floating point x = y/z */ 
extern octa fremstep ARGS((octa y, octa 2, int delta))-, 

/ * floating point x rem z = y rem 2 */ 

extern octa fintegerize ARGS((octa 2, int mode))', /* floating point x = round(2) */ 
extern int fcomp ARGS((octa y, octa 2)); 

/* -1, 0, 1, or 2 if y < 2, y = 2, y > 2, y II 2 */ 
extern int fepscomp ARGS((octa y, octa 2, octa eps,int sim)); 

I* X = sim? [y ^ z (e)| : [y « 2 (e)] */ 
extern octa floatit ARGS((octa 2, int mode, int unsgnd , int shrt))-, 

/* fix to float * / 

extern octa fixit ARGS((octa 2, int mode)); /* float to fix */ 

extern void prinCfloat ARGS((octa 2)); /* print octabyte as floating decimal */ 

extern int scamconst ARGS((char *buf)); 

/* val = floating or integer constant; returns the type */ 
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14 . Here’s a quick check to see if arithmetic is in trouble. 
:?^define panic{m) 

{ /pnni/ (sttferr, "Panic : u7oS ! \n" , m); exit{—2)-, } 

{ Initialize everything 14 ) = 
if {shiftJeft{neg-one, l).h ^ 

pamc("Incorrectuimplementationuofutypeutetra" ); 

See also sections 18, 24, 32, 41, 77, and 147. 

This code is used in section 141. 



ARCS = macro (), §11. 
aux: octa, MMIX-ARITH §4. 
bool = enum, §9. 
booLmult: octa (), 
MMIX-ARITH §29. 
byte^diff: tetra (), 
MMIX-ARITH §27. 
count J)its: int (), 

MMIX-ARITH §26. 
cur.round : int , 

MMIX-ARITH §30. 
exceptions-, int, 

MMIX-ARITH §32. 
exit: void (), <stdlib.h>. 
fcomp: int {), mmix-arith §85. 
f divide: octa (), 

MMIX-ARITH §44. 
fepscomp: int (), 

MMIX-ARITH §50. 
fintegerize: octa (), 
MMIX-ARITH §86. 
fixit: octa (), MMIX-ARITH §88. 
floatit: octa (), 

MMIX-ARITH §89. 
fmult: octa (), 



MMIX-ARITH §41. 
fplus: octa (), 

MMIX-ARITH §46. 
fprintf: int {), <stdio.h>. 
fremstep: octa (), 

MMIX-ARITH §93. 
froot: octa (), 

MMIX-ARITH §91. 
h: tetra, §10. 

incr: octa (), mmix-arith §6. 
1: tetra, §10. 
load^sf: octa (), 

MMIX-ARITH §39. 
neg^one: octa, mmix-ARITH §4. 
nexCchar: char *, 

MMIX-ARITH §69. 
oand: octa (), 

MMIX-ARITH §25. 
octa = struct, §10. 
odiv: octa (), MMIX-ARITH §13. 
ominus: octa (), 

MMIX-ARITH §5. 
omult: octa (), 

MMIX-ARITH §8. 



oplus: octa (), MMIX-ARITH §5. 
overflow: bool, 

MMIX-ARITH §4. 
print.float: void (), 
MMIX-ARITH §54. 
scan.const: int (), 
MMIX-ARITH §68. 
shiftJeft: octa (), 

MMIX-ARITH §7. 
shiftjright: octa (), 
MMIX-ARITH §7. 
signed.odiv: octa (), 
MMIX-ARITH §24. 
signed^omult: octa (), 
MMIX-ARITH §12. 
stderr: FILE =t=, <stdio.h>. 
store.sf: tetra (), 

MMIX-ARITH §40. 
tetra = unsigned int, §10. 
val: octa, MMIX-ARITH §69. 
wyde^diff: tetra (), 
MMIX-ARITH §28. 
zero.octa: octa, 

MMIX-ARITH §4. 
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15. Binary-to-decimal conversion is used when we want to see an octabyte as a 
signed integer. The identity [(an + 6)/10j = [a/lOjn + [((a mod 10)n + &)/10j is 
helpful here. 

^define sign.bit ((unsigned) *80000000) 

( Subroutines 12 ) += 

void prmt.int ARGS((octa)); 
void prindnt(o) 

octa o; 

{ 

register tetra hi = o.h, lo = o.l, r, t; 
register int j; 
char dig [20]-, 

if {lo = 0 A hi = 0) printf {"0")\ 
else { 

if {hi k, sign.bit) { 
printf ("-"); 
if {lo = 0) hi = —hi', 
else lo = —lo, hi = 

} 

for {j = 0; hi\ j++) { /* 64-bit division by 10 */ 

r = {{hi % 10) < 16) -b {lo > 16); 
hi = hi /1Q-, 

t = {{r % 10) <C 16) + {lo & *ffff ); 
lo = ((r/10) < 16) + (t/10); 
dig [j] =t% 10; 

} 

for ( ; lo\ j++) { 
dig[j\ = lo % 10; 
lo — /o/lO; 

} 

for {j — ; j > 0; j — ) printf {"Ic" , dig [j] + ’O’); 

} 

} 
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16. Simulated memory. Chunks of simulated memory, 2048 bytes each, are 
kept in a tree structure organized as a treap, following ideas of Vuillemin, Aragon, and 
Seidel [Communications of the ACM 23 (1980), 229-239; IEEE Symp. on Foundations 
of Computer Science 30 (1989), 540-546]. Each node of the treap has two keys: One, 
called loc, is the base address of 512 simulated tetrabytes; it follows the conventions 
of an ordinary binary search tree, with all locations in the left subtree less than the 
loc of a node and all locations in the right subtree greater than that loc. The other, 
called stamp, can be thought of as the time the node was inserted into the tree; all 
subnodes of a given node have a larger stamp. By assigning time stamps at random, 
we maintain a tree structure that almost always is fairly well balanced. 

Each simulated tetrabyte has an associated frequency count and source file refer- 
ence. 

( Type declarations 9 ) -|-= 

typedef struct { 

tetra tet-, /* the tetrabyte of simulated memory */ 
tetra freq-, /* the number of times it was obeyed as an instruction */ 
unsigned char bkpt\ /* breakpoint information for this tetrabyte */ 
unsigned char file.no; /* source file number, if known */ 
unsigned short line.no; /* source line number, if known */ 

} mem_tetra; 

typedef struct mem_node_struct { 

octa loc; /* location of the first of 512 simulated tetrabytes */ 
tetra stamp; /* time stamp for treap balancing */ 
struct mem_node_struct *left, *right; /* pointers to subtrees */ 
mem_tetra daf[512]; /* the chunk of simulated tetrabytes */ 

} mem.node; 

17. The stamp value is actually only pseudorandom, based on the idea of Fibonacci 
hashing [see Sorting and Searching, Section 6.4]. This is good enough for our purposes, 
and it guarantees that no two stamps will be identical. 

( Subroutines 12 ) -|-= 

mem.node *new.mem ARGS((void)); 
mem.node *new.mem{) 

{ 

register mem.node *p; 

p= (mem.node *) caZZoc(l, sizeof (mem.node)); 
if (^p) pamc("Can’tuallocateuanyuinoreuiiiemory" ); 
p-’stamp = priority; 

priority += *9e3779h9; /* [2®^(0— 1)J */ 

return p; 

} 



1: tetra, §10. 
octa = struct, §10. 

panic =ma.cTO (), §14. 



ARCS = macro (), §11. 
calloc: void *(), <stdlib.h>. 
h\ tetra, §10. 



print/: int (), <stdio.h>. 
priority: tetra, §19. 
tetra = unsigned int, §10. 
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18. Initially we start with a chunk for the pool segment, since the simulator will be 
putting command line information there before it runs the program. 

(Initialize everything 14) += 
memjroot = newjmem{)\ 
mem.rootMoc .h = *40000000; 
lastjmem = memjroot', 

19. (Global variables 19 ) = 

tetra priority = 314159265; /* pseudorandom time stamp counter */ 

mem_node *memjroot', /* root of the treap */ 

mem_node *lasLmem-, /* the memory node most recently read or written */ 

octa sclock', /* simulated clock */ 

See also sections 25, 31, 40, 48, 52, 56, 61, 65, 76, 110, 113, 121, 129, 139, 144, and 151. 

This code is used in section 141. 

20. The mem_find routine finds a given tetrabyte in the simulated memory, inserting 
a new node into the treap if necessary. 

( Subroutines 12 } += 

mem_tetra *mem_find ARGS((octa)); 
mem_tetra *mem.find{addr) 
octa addr; 

{ 

octa key, 

register int offset', 

register mem_node *p = last.mem; 

key.h — addr.h; 

key.l = addr . I & *fffff800; 

offset = addr . I & *7f c; 

if [p-locd key.l V p^*loc.h key.h) 

{ Search for key in the treap, setting lastjmem and p to its location 21 ); 
return &ip-‘dat[offset 2]; 

} 
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21. (Search for key in the treap, setting lastjmem and p to its location 21) = 

{ register mem_node **q\ 

for (p = memjroot', p; ) { 

if {key .1 = p->loc .1 A key .h = p-loc .h) goto found-, 

if {{key.l < p-’loc.l A key.h < p^loc.h) V key.h < pr’loc.h) p = pr’left-, 

else p = p->right-, 

} 

for {p = memjroot, q = &imem.root; p A qr-stamp < priority, p = *q) { 
if {{key.l < pr^loc.l A key.h < p^loc.h) V key.h < pnloc.h) q = Szp-left', 
else q = &ip->right’, 

} 

*q = nevjjmem { ); 

{*q)Moc = key, 

{ Fix up the subtrees of *q 22); 
p = *q; 

found : lasUmem = p; 

} 

This code is used in section 20. 

22. At this point we want to split the binary search tree p into two parts based on 
the given key, forming the left and right subtrees of the new node q. The effect will 
be as if key had been inserted before all of p’s nodes. 

( Fix up the subtrees of *q 22 } = 

{ 

register mem_node **l = &i{*q)->left , **r = &i{*q)-’right-, 
while (p) { 

if {{key.l < pnloc.l A key.h < p->loc.h) V key.h < p-’loc.h) *r = p,r = &cp^left,p = *r; 
else *l = p,l = &ipr>right,p = */; 

} 

*l = *r = A; 

} 

This code is used in section 21. 



ARCS = macro (), §11. 
dat: mem.tetra [], §16. 
h: tetra, §10. 
h tetra, §10. 
left: mem.node *, §16. 



loc: octa, §16. 
mem.node = struct, §16. 
mem.tetra = struct, §16. 
newjmem: mem.node *(), 

§ 17 . 



octa = struct, §10. 
right: mem.node *, §16. 
stamp: tetra, §16. 
tetra = unsigned int, §10. 



MMIX-SIM: LOADING AN OBJECT FILE 



350 



23. Loading an object file. To get the user’s program into memory, we read in 
an MMIX object, using modifications of the routines in the utility program MMOtype. 
Complete details of mmo format appear in the program for MMIXAL; a reader who 
hopes to understand this section ought to at least skim that documentation. Here we 
need to define only the basic constants used for interpretation. 



^define 

^define 

T^define 

^define 

T^define 

T^tdefine 

^define 

:j^define 

^^tdefine 

T^define 

T^define 

^define 

:j^define 

^define 



mm *98 
lop.quote 
lop Joe * 1 
lop^skip *2 
lop.fixo *8 
lop.fixr *4 
lop.fixrx *8 
lop.file *8 
lopjine *7 
lop^spec *8 
lopjpre *9 
lopjpost * a 
lop.stab *h 
lop.end * c 



/* the escape code of irano format */ 

0 /* the quotation lopcode */ 

/* the location lopcode */ 

/* the skip lopcode */ 

/* the octabyte-fix lopcode */ 

/* the relative- fix lopcode */ 

/* extended relative-fix lopcode */ 
/* the file name lopcode */ 

/* the file position lopcode */ 

/* the special hook lopcode */ 

/* the preamble lopcode */ 

/* the postamble lopcode */ 

/* the symbol table lopcode */ 

/* the end-it-all lopcode */ 



24. We do not load the symbol table. (A more ambitious simulator could implement 
MMIXAL-style expressions for interactive debugging, but such enhancements are left to 
the interested reader.) 

(Initialize everything 14) -|-= 

mmo.file = fopen{mmo.filejname, "rb"); 
if {-^mmo.file) { 

register char *altjname = (char *) calloc{strlen(mmo-file.name) + 5, sizeof (char)); 

if {-lalt.name) pamc("Can’tuallocateuf ileunameubuffer" ); 
sprintf {alt jname , ""/.s.rnmo" , mmo.file-name); 
mmo.file = f open {alt. name, "rb"); 
if {-immo.file) { 

fprintf {stderr , "Can’tuopenutheuobjectuf ileu’/suoru’/os ! \n" , mmo.file.name , 
alt.name)-, 
exit{—3)-, 

} 

free {alt. name)', 

} 

byte.count — 0; 



25. (Global variables 19 ) -|-= 

FILE *mmo.file; /* the input file */ 

int postamble-, /* have we encountered lop. post! */ 

int byte.count', /* index of the next-to-be-read byte */ 

byte 6n/[4]; /* the most recently read bytes */ 

int yzbytes; /* the two least significant bytes */ 

int delta; /* difference for relative fixup */ 

tetra tet; /* buf bytes packed big-endianwise */ 
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26. The tetrabytes of an mmo file are stored in friendly big-endian fashion, but this 
program is supposed to work also on computers that are little-endian. Therefore we 
read four successive bytes and pack them into a tetrabyte, instead of reading a single 
tetrabyte. 

^define mmo.err 

{ 

fprintf {stderr , "Baduobjectuf ile ! uCTryurunninguMMOtype . ) \n"); 
emi (— 4); 

} 

( Subroutines 12 ) -|-= 

void read.tet ARCS ((void)); 
void readj,et ( ) 

{ 

if {fread{buf ,1,4:, mmo. file) ^ 4) mmo.err; 

yzbytes = {buf{2] <C 8) -I- buf[3]-, 

let = {{{buf[Q] <C 8) -I- &m/[1]) <C 16) -|- yzbytes-, 

} 

27. (Subroutines 12 ) -|-= 
byte read.byte ARCS ((void)); 
byte read.byte ( ) 

{ 

register byte fe; 

if {-ibyte.count) read.tet{); 

b— buf [byte.count]-, 

byte.count = (byte.count -|- 1) & 3; 

return fe; 

} 

28. ( Load the preamble 28 ) = 

read.tet ( ) ; /* read the hrst tetrabyte of input * / 

if {buf[0] -fi mm V buf[l] lop.pre) mmo. err-, 
if [ybyte 7 ^ 1 ) mmo. err-, 
if (zbyte = 0) obj.time = 
else { 

j = zbyte - 1; 

read.tet{)- obj.time = tet-, /* file creation time */ 
for { -, j > Q-, j — ) read.tet{)-, 

} 

This code is used in section 32. 



ARCS = macro (), §11. 
byte = unsigned char, §10. 
calloc: void *(), <stdlib.h>. 
exit: void (), <stdlib.h>. 
FILE, <stdio.h>. 
fopen: FILE *(), <stdio.h>. 
fprintf: int (), <stdio.h>. 



fread: size.t {), <stdio.h>. 
free: void (), <stdlib.h>. 
j: register int, §62. 
mmo_/z/e_name = macro, §142. 
obj.time: tetra, §31. 
pamc= macro (), §14. 



sprintf: int (), <stdio.h>. 
stderr: FILE <stdio.h>. 
strlen: size.t (), <string.h>. 
tetra = unsigned int, §10. 

= macro, §33. 
zbyte = macro, §33. 
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29 . ( Load the next item 29 ) = 

{ 

read.tet ( ) ; 

loop: if (6 m/ [0] = mm) 
switch (6 m/[1]) { 

case lop.quote: if {yzbytes 7 ^ 1 ) mmo.err; 
read.tet{)\ break; 

( Cases for lopcodes in the main loop 33 ) 
case lopjpost: postamble = 1 ; 

if {ybyte V zbyte < 32) mmo.err-, 

continue; 
default : mmo.err] 

} 

{ Load tet as a normal item 30 ) ; 

} 

This code is used in section 32. 

30 . In a normal situation, the newly read tetrabyte is simply supposed to be loaded 
into the current location. We load not only the current location but also the current 
file position, if cur dine is nonzero and cur Joe belongs to segment 0. 

^(tdefine mmoJoad{loc, val) II = mem.find{loc), ll->tet 0= val 
{ Load tet as a normal item 30 ) = 

{ 

mmojoad ( cur Joe Jet); 
if (curjine) { 

ll-file.no = cur.file; 
lUinejno = curJine; 
curjine++; 

} 

cur Joe = incr{curJoc,A); curjoc.l &= —4; 

} 

This code is used in section 29. 

31 . (Global variables 19 } += 

octa cur Joe; /* the current location */ 

int cur. file = — 1 ; /* the most recently selected file number */ 

int curjine; /* the current position in cur. file, if nonzero */ 
octa tmp; /* an octabyte of temporary interest */ 
tetra obj.time; /* when the object file was created */ 

32 . ( Initialize everything 14 ) += 
cur.loc.h = curjoc.l = 0; 

cur. file = — 1; 
cur.line = 0; 

( Load the preamble 28 }; 

do ( Load the next item 29 } while {^postamble); 

{ Load the postamble 37 ) ; 
f close { mmo.file) ; 
curjine = 0; 
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33. We have already implemented lop-quote, which falls through to the normal case 

after reading an extra tetrabyte. Now let’s consider the other lopcodes in turn. 
95 ^:define ybyte buf[2] /* the next-to- least significant byte */ 

T^tdefine zbyte buf[3] /* the least significant byte */ 

{ Cases for lopcodes in the main loop 33 ) = 
case lopJoc'. if {zbyte = 2) { 

j = ybyte', read^tet { ); cur Joe. h = {j 24) + tet', 

} else if {zbyte = 1) cur Joe. h — ybyte <C 24; 
else mmo.err', 
readjet{); cur Joe. I = tet', 
continue; 

case lop-skip'. curJoc = incr{curJoc,yzbytes)', continue; 

See also sections 34, 35, and 36. 

This code is used in section 29. 

34. Fixups load information out of order, when future references have been resolved. 
The current file name and line number are not considered relevant. 

( Cases for lopcodes in the main loop 33 ) += 
case lop-fixo'. if {zbyte = 2) { 

j = ybyte', read Jet { ); tmp.h = {j 24) + tet; 

} else if {zbyte = 1) tmp.h = ybyte <C 24; 

else mmo.err; 

read Jet {); tmp.l = tet; 

mmoJoad{tmp, cur Joe. h); 

mmoJoad {incr {tmp , A) , cur Joe. 1); 

continue; 

case lop-fixr: delta = yzbytes; 
goto fixr; 

case lop.fixrx: j = yzbytes; if {j ^ 16 A j ^ 24) mmo^err; 
read Jet ( ) ; 
delta = tet; 

if {delta &i*te000000) mmo.err; 

fixr: tmp = incr{curJoc, —{delta > *1000000 ? {delta & ) — (1 ^ j) : delta) 2); 

mmoJoad{tmp , delta); 

continue; 



buf: byte [], §25. 
delta: int, §25. 
fclose: int (), <stdio.h>. 
file.no: unsigned char, §16. 
h: tetra, §10. 

incr: octa (), MMIX-ARITH §6. 
j: register int, §62. 

1: tetra, §10. 

line.no: nnsigned short, §16. 
ll: register mem_tetra *, 



§62. 

lop.fixo = *3, §23. 
lop.fixr = *4, §23. 
lop.fixrx = *5, §23. 
lop.loe = *1, §23. 
lop.post =*a, §23. 
lop.quote =*0, §23. 
lop.skip = *2, §23. 
mem.find: mem.tetra *(), 
§ 20 . 



mm = *98, §23. 
mmo.err = macro, §26. 
mmo.file: FILE *, §25. 
octa = struct, §10. 
postamble: int, §25. 
read.tet: void (), §26. 
tet: tetra, §25. 
tet: tetra, §16. 
tetra = unsigned int, §10. 
yzbytes: int, §25. 
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35 . The space for file names isn’t allocated until we are sure we need it. 

( Cases for lopcodes in the main loop 33 ) += 

case lop-file: if {file-inf o[ybyte]. name) { 
if {zbyte) mmo-err; 
cur-file = ybyte\ 

} else { 

if {-'zbyte) mmo-crr; 

file-inf o[ybyte], name = (char *) calloc{4* zbyte + 1, 1); 
if {-! file-inf o[ybyte].name) { 

fprintf {stderr, "Nouroomutoustoreutheuf ileuname ! \n" ); exit{—5)', 

} 

cur-file = ybyte\ 

for {j = zbyte, p = file-info[ybyte].name; j > 0; j — ,p += 4) { 
read-tet ( ) ; 

*P=6m/[0]; *{p+ 1) = buf[l]- *{p + 2) = buf[2]-, *{p + 3) = buf[S\-, 

} 

} 

cur-line = 0; continue; 
case lop-line: if {cur-file < 0) mmo-err\ 
cur-line = yzbytes\ continue; 

36 . Special bytes are ignored (at least for now). 

( Cases for lopcodes in the main loop 33 ) += 
case lop-spec'. while (1) { 
read-tet ( ) ; 
if (6it/[0] = mm) { 

if {buf[l] 7 ^ lop-quote V yzbytes 7 ^ 1) goto loop; /* end of special data */ 
read-tet ( ) ; 

} 

} 

37 . Since a chunk of memory holds 512 tetrabytes, the U pointer in the following 
loop stays in the same chunk (namely, the first chunk of segment 3, also known as 
Stack_Segment) . 

( Load the postamble 37 ) = 

aux.h = *60000000; aux.l = *18; 
ll = mem-find {aux); 

{II — l)-‘tet — 2; /* this will ultimately set rL = 2 */ 

{ll — 3)-tet = argc; /* and $0 = argc */ 

{ll - A)^tet = *40000000; 

{ll — 3)-tet = *8; /* and $1 = Pool_Segment + 8 */ 

G = zbyte; L = 0; 

for {j = G + G; j < 256 + 256; j++, U++, aux.l += 4) read-tet { ), ll~>tet = tet; 
inst-ptr.h = {ll — 2)^tet, inst-ptr.l = {ll — l)^tet; /* Main */ 

{11+2* l2)->tet = G < 24; 

(?[255] = incr{aux, 12 * 8); /* we will UNSAVE from here, to get going */ 

This code is used in section 32. 
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38. Loading and printing source lines. The loaded program generally con- 
tains cross references to the lines of symbolic source files, so that the context of each 
instruction can be understood. The following sections of this program make such 
information available when it is desired. 

Source file data is kept in a file.node structure: 

( Type declarations 9 } -|-= 
typedef struct { 

char *nome; /* name of source file */ 

int line^count', /* number of lines in the file */ 

long *map; /* pointer to map of file positions */ 

} file_node; 

39. In partial preparation for the day when source files are in Unicode, we define a 
type Char for the source characters. 

( Type declarations 9 ) -|-= 

typedef char Char; /* bytes that will become wydes some day */ 

40. ( Global variables 19 ) -|-= 

file_node file Anfo [256]; /* data about each source file */ 

int bufisize ; /* size of buffer for source lines * / 

Char * buffer; 

41. As in MMIXAL, we prefer source lines of length 72 characters or less, but the user 
is allowed to increase the limit. (Longer lines will silently be truncated to the buffer 
size when the simulator lists them.) 

( Initialize everything 14 ) -|-= 
if {bufisize < 72) bufisize = 72; 

buffer = (Char *) calloc {bufisize + 1, sizeof (Char)); 
if {-^buffer) pamc("Can’tuallocateuSourceulineubuffer" ); 



urge: int, §141. 

aux: octa, MMIX-ARITH §4. 

buf: byte [], §25. 

calloc: void *(), <stdllb.h>. 

cur^file: int, §31. 

curjine: int, §31. 

exit: void (), <stdllb.h>. 

fprintf: int (), <stdio.h>. 

g: octa [], §76. 

G: register int, §75. 
h: tetra, §10. 

incr: octa (), MMIX-ARITH §6. 
instjptr: octa, §61. 



j: register int, §62. 

1: tetra, §10. 

L: register int, §75. 
ll: register mem.tetra *, 
§62. 

loop: label, §29. 
lop.file = *6, §23. 
lop.line = *7, §23. 
lop.quote =*0, §23. 
lop^spec = *8, §23. 
mem.find: mem.tetra *(), 
§ 20 . 



mm = *98, §23. 
mmo.err = macro, §26. 
p: register char *, §62. 
panic = macro (), §14. 
readAet: void (), §26. 
rL = 20, §55. 

stderr: FILE *, <stdio.h>. 
tet: tetra, §16. 
tet: tetra, §25. 
ybyte = macro, §33. 
yzbytes: int, §25. 
zbyte = macro, §33. 
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42 . The first time we are called upon to list a line from a given source file, we make 
a map of starting locations for each line. Source hies should contain at most 65535 
lines. We assume that they contain no null characters. 

( Subroutines 12 ) += 

void make.map ARCS ((void)); 
void make.map ( ) 

{ 

long map [65536]; 
register int k, 1; 
register long *p; 

{ Check if the source file has been modified 44 ) ; 
for (1 = 1; I < 65536 A -^feof (src.file)', 1-H-) { 
map[l] — ftell{src^file)-, 

loop: if {-^f gets {buffer , buf. size , src.file)) break; 
if {buffer [strlen {buffer) — 1] ^ ’\n’) goto loop; 

} 

file.info[cur.file].line.count — 1; 

file jinfo[cur^file]. map = p= (long *) calloc(l, sizeof (long)); 
if (=p) pamc("NouroomuforuauSource-lineuinap" ); 
for (fc = 1; k <l\ k++) p[k] — map[k]; 

} 

43 . We want to warn the user if the source hie has changed since the object hie 
was written. The standard C library doesn’t provide the information we need; so we 
use the UNIX system function stat, in hopes that other operating systems provide a 
similar way to do the job. 

(Preprocessor macros ll) += 

^include <sys/types .h> 

^(l^include <sys/stat.h> 

44 . ( Check if the source file has been modified 44 ) = 

{ 

struct stat staLbuf; 

if {stat {file.info[cur.file]. name , &istaCbuf) > 0) 

if ((tetra) staCbuf .sCmtime > objMme) fprintf {stderr , 

"Warning: uFileu°/oSuwasuiiiodif led; uituinayunotuHiatchutheuprogram ! \n" , 
file-info [cur-file]. name ) ; 

} 

This code is used in section 42. 

45 . Source lines are listed by the print-line routine, preceded by 12 characters 
containing the line number. If a hie error occurs, nothing is printed — not even an 
error message; the absence of listed data is itself a message. 

( Subroutines 12 ) += 

void print-line ARCS ((int)); 
void print-line {k) 
int k', 



{ 
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char 6tt/[ll]; 

if {k > file.info[cur_file].line.count) return; 

if {f seek {src-file, file.info[cur. file].map[k], SEEK_SET) ^ 0) return; 

if {-> f gets {buff er ,buf size, src.file)) return; 

sprintf ( buf , " '/.d : uuuu " , fc) ; 

print/ ("lineu"/o.6su°/os" , buf , buffer); 

if {buffer [strlen (buffer) — 1] 7^ ’\n’) printf {"\n"); 

lineshown = true; 

} 

46. (Preprocessor macros 11) += 

#ifndef SEEK_SET 

/^define SEEK_SET 0 /* code for setting the file pointer to a given offset */ 

/tendif 

47. The showJine routine is called when we want to output line curJine of source 
file number cur_file, assuming that cur dine 7 ^ 0. Its job is primarily to maintain 
continuity, by opening or reopening the src-file if the source file changes, and by 
connecting the previously output lines to the new one. Sometimes no output is 
necessary, because the desired line has already been printed. 

( Subroutines 12 ) += 

void showJine ARGS((void)); 
void showJine{) 

{ 

register int k; 

if {shown.file 7 ^ cur.file) (Prepare to list lines from a new source file 49) 
else if {shownjme = curJine) return; /* already shown */ 
if {curJine > shownJine + gap + 1 V curJine < shownJine) { 
if {shownJine > 0) 

if {curJine < shownJine) printf{" \n"); 

/* indicate upward move */ 
else pnnt/("uuuuu • • • \n" ); /* indicate the gap */ 

print Jine { curJine ) ; 

} else for {k = shownJine + 1; fc < curJine; k++) printJine{k); 
shownJine = curJine; 

} 



ARCS = macro (), §11. 
buf. size: int, §40. 
bujfer: Char §40. 
calloc: void *(), <stdlib.h>. 
cur.file: int, §31. 
curJine: int, §31. 
feof: int (), <stdio.h>. 
fgets: char *(), <stdio.h>. 
file.info: file_node [], §40. 
fprintf: int (), <stdio.h>. 
fseek: int (), <stdio.h>. 



ftell: long (), <stdio.h>. 
gap: int, §48. 
line.count: int, §38. 
line.shown: bool, §48. 
map: long *, §38. 
name: char *, §38. 
objjime: tetra, §31. 
panic =ma.cTO (), §14. 
printf: int (), <stdio.h>. 
SEEK_SET = macro, <stdio.h>. 
shown.file: int, §48. 



shownJine: int, §48. 
sprintf: int {), <stdio.h>. 
src.file: FILE §48. 
st.mtime: time.t, 

<sys/stat .h>. 
stat: int (), <sys/stat .h>. 
stderr: FILE <stdio.h>. 
strlen: size.t (), <string.h>. 
tetra = unsigned int, §10. 
true = 1, §9. 
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48. (Global variables 19 ) += 

FILE *src-file\ /* the currently open source file */ 

int shown.file = — 1; /* index of the most recently listed file */ 

int shown Jine] /* the line most recently listed in shown.file */ 

int gap-, /* minimum gap between consecutively listed source lines */ 

bool lme.shown-, /* did we list anything recently? */ 

bool showing.source; /* are we listing source lines? */ 

int profile.gap-, /* the gap when printing final frequencies */ 

bool profUeshowing.source-, /* showing. source within final frequencies */ 

49. ( Prepare to list lines from a new source file 49 ) = 

{ 

if {-isrc.file) src.file = fopen(file.info[cur.file].name,"r")-, 
else freopen{file.info[cur.file].name , "r" , src.file)-, 
if {-isrc.file) { 

fprintf (stderr , "Warning: uIucan’tuopenufileu"/oS ; usourceulistinguomitted. \n" , 
file.info [cur.file]. name ) ; 
showing. source = false ; 

return; 

} 

printf ( " \ ""/,s\ " \n" , file.info [cur.file]. name ) ; 
shown.file = cur.file-, 
shown.line = 0; 

if {-ifile.info[cur.file].map) make.map{)-, 

} 

This code is used in section 47. 

50. Here is a simple application of show.line. It is a recursive routine that prints 
the frequency counts of all instructions that occur in a given subtree of the simulated 
memory and that were executed at least once. The subtree is traversed in symmetric 
order; therefore the frequencies appear in increasing order of the instruction locations. 
( Subroutines 12 ) += 

void print.freqs ARGS((mem_node *)); 
void print.freqs (p) 
mem_node *p; 

{ 

register int j; 
octa cur.loc; 

if (pwleft) print.freqs (jr>left ) ; 
for (j = 0; j < 512; j++) 

if (p-*dat[j].freq) ( Print frequency data for location pr^/oc + 4 * j 51 ); 
if (p->right) print.freqs (jwright)-, 

} 
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51. An ellipsis (...) is printed between frequency data for nonconsecutive instruc- 
tions, unless source line information intervenes. 

( Print frequency data for location p->loc + 4* j 51 ) = 

{ 

cur doc = incr{p^loc, 4 * j); 
if {showing.source A p-^dat\j].line-no) { 

cur. file = p->dat[j].file.no , cur dine = p->dat[j], line.no; 

Ime.shown = false; 
show.line { ); 

if (line.shown) goto loc.implied; 

} 

if {cur.loc.l ^ implieddoc.l V cur.loc.h ^ implieddoc.h) 

if {profile.started) print/ ("uuuuuuuuuO . uuuuuuuu ... \n" ); 
loc.implied: print/ 1 Od. u"/o08x"/,08x : u’/.08xu (7os) \n" ,p-*dat [j]-freq, cur.loc.h, cur.loc.l, 
pr*dat[j].tet , info[p-^dat[j].tet 24]. name); 
implied.loc = incr{cur.loc,4); profile.started = true; 

} 

This code is used in section 50. 

52. ( Global variables 19 ) -|-= 

octa implied.loc; /* location following the last shown frequency data */ 
bool profile.started; /* have we printed at least one frequency count? */ 

53. ( Print all the frequency counts 53 ) = 

{ 

prmt/ ("\nPr ogramupr of ile : \n" ); 

shown.file = cur. file = — 1; shown.line = cur dine = 0; 

gap = profile.gap; 

showing. source = profile.showing.source ; 
implied.loc = neg.one; 
print.fregs {mem.root ) ; 

} 

This code is used in section 141. 



ARCS = macro (), §11. 
bool = enum, §9. 
cur^file: int, §31. 
curJine: int, §31. 
dat: mem.tetra [], §16. 
false = 0, §9. 

FILE, <stdio.h>. 
file^info: file_node [], §40. 
file.no: unsigned char, §16. 
fopen: FILE *(), <stdio.h>. 
fprintf: int (), <stdio.h>. 



freopen: FILE *(), <stdio.h>. 
freq: tetra, §16. 
h: tetra, §10. 

incr: octa (), mmix-ARITH §6. 
info: op.info [], §65. 

1: tetra, §10. 
left: mem_node *, §16. 
line.no: unsigned short, §16. 
loc: octa, §16. 
make.map: void (), §42. 
map: long *, §38. 



mem.node = struct, §16. 
mem.root: mem.node *, §19. 
name: char *, §38. 
neg.one: octa, mmix-ARITH §4. 
octa = struct, §10. 
printf: int (), <stdio.h>. 
right: mem.node *, §16. 
show.line: void (), §47. 
stderr: FILE *, <stdio.h>. 
tet: tetra, §16. 
true = 1, §9. 
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54. Lists. This simulator needs to deal with 256 different opcodes, so we might 
as well enumerate them now. 

( Type declarations 9 ) += 

typedef enum { 

TRAP , FCMP , FUN , FEQL , FADD , FIX , FSUB , FIXU , 

FLOT , FLOTI , FLOTU , FLOTUI , SFLOT , SFLOTI , SFLOTU , SFLOTUI , 

FMUL, FCMPE, FUNE, FEQLE, FDIV, FSqRT, FREM, FINT, 

MUL, MULI , MULU, MULUI , DIV, DIVI , DIVU, DIVUI , 

ADD , ADDI , ADDU, ADDUI , SUB , SUBI , SUBU, SUBUI , 

IIADDU, IIADDUI , IVADDU, IVADDUI , VIIIADDU, VIIIADDUI , XVIADDU, XVI ADDUI , 

CMP , CMPI , CMPU, CMPUI , NEC , NEGI , NEGU, NEGUI , 

SL,SLI,SLU,SLUI,SR,SRI,SRU,SRUI, 

BN , BNB , BZ , BZB , BP , BPB , BOD , BODB , 

BNN , BNNB , BNZ , BNZB , BNP , BNPB , BEV , BEVB , 

PBN , PBNB , PBZ , PBZB , PBP , PBPB , PBOD , PBODB , 

PBNN , PBNNB , PBNZ , PBNZB , PBNP , PBNPB , PBEV , PBEVB , 

CSN, CSNI , CSZ, CSZI, CSP, CSPI , CSOD, CSODI, 

CSNN , CSNNI , CSNZ , CSNZI , CSNP , CSNPI , CSEV , CSEVI , 

ZSN, ZSNI, ZSZ, ZSZI, ZSP, ZSPI, ZSOD, ZSODI, 

ZSNN , ZSNNI , ZSNZ , ZSNZI , ZSNP , ZSNPI , ZSEV , ZSEVI , 

LDB , LDBI , LDBU, LDBUI , LDW , LDWI , LDWU, LDWUI , 

LDT, LDTI , LDTU, LDTUI , EDO , LDOI , LDOU, LDOUI , 

LDSF , LDSFI , LDHT , LDHTI , CSWAP , CSWAPI , LDUNC , LDUNCI , 

LDVTS , LDVTS I , PRELD , PRELD I , PREGO , PREGO I , GO , GO I , 

STB , STBI , STBU, STBUI , STW , STWI , STWU, STWUI , 

STT , STTI , STTU, STTUI , STO , STOI , STOU, STOUI , 

STSF, STSFI,STHT, STHTI, STCO, STCOI, STUNC, STUNCI, 

SYNCD , SYNCDI , PREST , PRESTI , SYNCID , SYNCIDI , PUSHGO , PUSHGOI , 

OR , ORI , ORN , ORNI , NOR , NORI , XOR , XORI , 

AND , ANDI , ANDN , ANDNI , NAND , NANDI , NXOR, NXORI , 

BDIF, BDIFI, WDIF, WDIFI , TDIF, TDIFI, ODIF, ODIFI , 

MUX , MUXI , SADD , SADDI , MOR , MORI , MXOR, MXORI , 

SETH, SETMH, SETML, SETL, INCH, INCMH, INCML, INCL, 

ORH , ORMH , ORML , ORL , ANDNH , ANDNMH , ANDNML , ANDNL , 

IMP , JMPB , PUSH! , PUSH JB , GETA , GETAB , PUT , PUTI , 

POP , RESUME , SAVE , UNSAVE , SYNC , SWYM , GET , TRIP 

} mmix.opcode; 



55. We also need to enumerate the special names for special registers. 

( Type declarations 9 ) += 

typedef enum { 

rB , rD, rE, rH , rj, rM , rR, rBB , rC , rN , rO , rS , rl , rT , rTT , rK , rQ, rU , rV , rG, rL, 
rA, rF,rP, rW , rX,rY,rZ, rWW , rXX,rYY,rZZ 

} speciaLreg; 



56. (Global variables 19 } += 



char *speciaLname[32] = {"rB", "rD", "rE" , "rH", "rJ", "rM", " 
"rQ" ^ iij-S" , "rl" , "rT" , "rTT" , "rK" , "rQ" , "rU" , "rV" , "rG" , 
"rW" , "rX" , "rY" , "rZ" , "rWW" , "rXX" , "rYY" , "rZZ" }; 



rR", " 
"rL", 



rBB", 

"rA", 



"rC", 

"rF", 



"rN", 

"rP", 
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57. Here are the bit codes for arithmetic exceptions. These codes, except H_BIT, 
are defined also in MMIX-ARITH. 



#deflne X_BIT (1 < 8) 
#deflne Z_BIT (K 9) 
#deflne U_BIT (1 < 10) 
#deflne 0_BIT (1 < 11) 
#deflne I_BIT (1 < 12) 
#deflne W_BIT (1 < 13) 
#deflne V_BIT (1 < 14) 
#deflne D_BIT (1 < 15) 
#deflne H_BIT (1 < 16) 



/* floating inexact */ 

/* floating division by zero */ 

/* floating underflow */ 

/* floating overflow */ 

/* floating invalid operation */ 
/* float-to-fix overflow */ 

/* integer overflow */ 

/* integer divide check */ 

/* trip */ 



58. The bkpt field associated with each tetrabyte of memory has bits associated 
with forced tracing and/or breaking for reading, writing, and/or execution. 

9 )^:deflne trace.bit (1 <C 3) 

:^deflne read^bit (1 ^ 2) 

^define write-bit (1 <C 1) 

9 )tdeflne exec-bit (1^0) 

59. To complete our lists of lists, we enumerate the rudimentary operating system 
calls that are built in to MMIXAL. 

^define max-sys-call Ftell 
{ Type declarations 9 ) += 

typedef enum { 

Halt , Fopen , Fclose , Fread , Fgets , Fgetws , Fwrite , Fputs , Fputws , Fseek , Ftell 

} sys_call; 



bkpt: unsigned char, §16. 
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60. The main loop. Now let’s plunge in to the guts of the simulator, the master 
switch that controls most of the action. 

( Perform one instruction 60 ) = 

{ 

if (resuming) loc = incr (instjptr , —4), inst = g[rX].l-, 
else (Fetch the next instruction 63); 

op = inst S> 24; xx = (inst ^ 16) & ; yy = (inst 8) & zz = inst & ; 

/ = inf o[op], flags; yz = inst & *ffff ; 

x = y = z = a = b = zero.octa; exc = 0; old.L = L; 

if (/ & reLaddr.bit) (Convert relative address to absolute address 70 ); 

(Install operand fields 71 ); 
if (f k, Xjis.dest.bit) 

(Install register X as the destination, adjusting the register stack if necessary so); 
w = oplus(y, z); 

if (loc.h > *20000000) goto privilegedXnst; 
switch (op) { 

( Cases for individual MMIX instructions 84 ) ; 

} 

(Check for trip interrupt 122 ); 

( Update the clocks 127 ); 

(Trace the current instruction, if requested 128 ); 
if (resuming A op ^ RESUME) resuming = false; 

} 

This code is used in section 141. 

61. Operands x and a are usually destinations (results), computed from the source 
operands y, z, and/or b. 

{ Global variables 19 ) += 

octa w, X, y, z, a, b, ma, mb; /* operands */ 

octa *x.ptr; /* destination */ 

octa loc; /* location of the current instruction */ 

octa inst.ptr; /* location of the next instruction */ 

tetra inst; /* the current instruction */ 

int old^L; /* value of L before the current instruction */ 

int exc; /* exceptions raised by the current instruction */ 

int tracing.exceptions ; /* exception bits that cause tracing */ 

int rop; /* ropcode of a resumed instruction */ 

int round.mode; /* the style of floating point rounding just used */ 

bool resuming; /* are we resuming an interrupted instruction? */ 

bool halted; /* did the program come to a halt? */ 

bool breakpoint; /* should we pause after the current instruction? */ 

bool tracing; /* should we trace the current instruction? */ 

bool stackXracing ; /* should we trace details of the register stack? */ 

bool interacting; /* are we in interactive mode? */ 

bool interacEafterJjreak; /* should we go into interactive mode? */ 

bool tripping; /* are we about to go to a trip handler? */ 

bool good; /* did the last branch instruction guess correctly? */ 

tetra traceXhreshold; /* each instruction should be traced this many times */ 
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62. (Local registers 62 ) = 

register mmix_opcode op; /* operation code of the current instruction */ 

register int xx, yy, zz, yz; /* operand fields of the current instruction */ 

register tetra /; /* properties of the current op */ 

register int i, j, k; /* miscellaneous indices */ 

register mem_tetra *11; /* current place in the simulated memory */ 

register char *p; /* current place in a string */ 

See also section 75. 

This code is used in section 141. 

63. ( Fetch the next instruction 63 ) = 

{ 

loc = insLptr; 
ll = mem.find{loc); 
inst = ll^tet; 
cur.file = ll-file_no; 
cur dine = llHinejaa; 
ll-‘freq++; 

if {ll-‘bkpt & exec-bit) breakpoint = true; 

tracing = breakpoint V {ll^bkpt & trace-bit) V {ll-'freq < trace-threshold); 
inst-ptr = incr[inst-ptr ,4); 

} 

This code is used in section 60. 

64. Much of the simulation is table-driven, based on a static data structure called 
the op.info for each operation code. 

( Type declarations 9 } H-= 

typedef struct { 

char *name; /* symbolic name of an opcode */ 
unsigned char /2a^5; /* its instruction format */ 

unsigned char third.operand; /* its special register input */ 
unsigned char mems; /* how many p it costs */ 
unsigned char oops] /* how many v it costs */ 
char ^tracejormat; /* how it appears when traced */ 

} op.info; 



bkpt: unsigned char, §16. 
bool = enum, §9. 
cur^file: int, §31. 
curJine: int, §31. 
exec.bit = macro, §58. 
false = 0, §9. 

file.no: unsigned char, §16. 
freq: tetra, §16. 
g: octa [], §76. 
h: tetra, §10. 

incr: octa (), mmix-ARITH §6. 



info: op.info [], §65. 

1 : tetra, §10. 

L: register int, §75. 
line.no: unsigned short, §16. 
mem.find: mem.tetra *(), 
§ 20 . 

mem.tetra = struct, §16. 
mmix_opcode = enum, §54. 
octa = struct, §10. 
oplus: octa (), MMIX-ARITH §5. 
privileged.inst: label, §107. 



rel.addr.hit = §65. 

RESUME = ^f9, §54. 

rX=2b, §55. 

tet: tetra, §16. 

tetra = unsigned int, §10. 

trace.bit = macro, §58. 

true = 1, §9. 

X.is.dest.bit = "^20, §65. 
zero.octa: octa, 
MMIX-ARITH §4. 
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65. For example, the flags field of info [op] tells us how to obtain the operands from 
the X, Y, and Z fields of the current instruction. Each entry records special properties 
of an operation code, in binary notation; *1 means Z is an immediate value, *2 means 
rZ is a source operand, *4 means Y is an immediate value, *8 means rY is a source 
operand, ’^10 means rX is a source operand, *20 means rX is a destination, *40 means 
YZ is part of a relative address, *80 means a push or pop or unsave instruction. 

The trace-format field will be explained later. 

T^deflne Z-is-immed-bit * 1 
T^deflne Z-is-Sourcc-bit *2 
T^deflne YJ,sJ,mmedJ>it *4 
T^deflne Yjissource-bit *8 
T^deflne Xjissource-bit *10 
:j^deflne X-is-dest-bit *20 
^define reLaddr-bit *40 
:j^deflne push-pop-bit *80 

( Global variables 19 ) += 

op_info m/o [256] = {(Info for arithmetic commands 66), (Info for branch 

commands 67), (Info for load/store commands 68 ), (Info for logical and control 
commands 69)}; 

66. (Info for arithmetic commands 66) = 

{"TRAP" , *0a, 255, 0, 5, ""/.r" }, 

{"FCMP",*2a,0,0, 1, "•/.lu=u"/. • Yucmpu"/. . Zu=u"/.x" }, 

{ "FUN" , *2a, 0, 0, 1, ""/.lu=u ["/. • y ( I I ) 7. . z] u=u7.x" }, 

{"FEQL" , *2a, 0, 0, 1, "7.1u=u [7..y(==)7..z]u=u7.x" }, 

{ "FADD" , *2a, 0, 0, 4, "y.lu=u7. • yu7. (+7.) u7. • Zu=u7. . x" }, 

{"FIX" , *26, 0, 0, 4, "7.1u=u7.(f ix7.)u7. . Zu=u7.x" }, 

{ "FSUB" , *2a, 0, 0, 4, "7.1u=u7. . yu7. (-7.) u7. . Zu=u7. . x" }, 

{"FIXU" , *26, 0, 0, 4, "7.1u=u7.(f ix7.)u7..zu=u7.#x" }, 

{ "FLOT" , *26, 0, 0, 4, "7.1u=u7. (f loty.)u7.Zu=u7. • x" }, 

{"FLOTI" , *25,0, 0, 4, "7.1u=u7.(flot7.)u7.Zu=u7..x"}, 

{ "FLOTU" , *26, 0, 0, 4, "7.1u=u7.(f lot7.) u7.#Zu=u7. . x" }, 

{"FLOTUI" , *25, 0, 0, 4, "7.1u=u7.(floty.)u7.Zu=u7..x" }, 

{"SFLOT" , *26, 0, 0, 4, "7.1u=u7.(sf lot7.)u7.Zu=u7. . x" }, 

{ "SFLOTI ",*25,0,0,4, "7.1u=u7. (sf lot7.) u7.Zu=u7. . x" } , 

{ "SFLOTU" , *26, 0, 0, 4, "y.lu=u7. (sf lot7.) u7.#Zu=u7. • x" }, 

{"SFLOTUI",*25,0,0,4, "7.1u=u7.(sfloty.)u7.Zu=u7..x"}, 

{ "FMUL" , *2a, 0, 0, 4, "7.1u=u7. . yu7. (*7.) u7. . Zu=u7. . x" }, 

{ "FCMPE" , *2a, rE , 0, 4, "7.1u=u7. • yucmpu7. . Zu (7. . b) )u=u7.x" }, 

{ "FUNE" , *2a, rF , 0, 1, "7.1u=u [7. • y ( I I ) 7. • Zu (7. . b) ] u=u7.x" }, 

{ "FEQLE" , *2a, rE , 0, 4, "7.1u=u 17. . y (==)7. . Zu (7. . b) ] u=u7.x" }, 

{ "FDIV" , *2a, 0, 0, 40, "7.1u=u7. • yu7. (/7.) u7. . Zu=u7. . X" }, 

{ "FSQRT" , *26, 0, 0, 40, "7.1u=u7.(sqrt7.)u7. . Zu=u7. . x" }, 

{ "FREM" , *2a, 0, 0, 4, "7.1u=u7. . yu7. (rem7.) u7. . Zu=u7. . x" }, 

{ "FINT" , *26, 0, 0, 4, "7.1u=u7. (int7.) u7. . Zu=u7. . x" }, 

{"MUL" , *2a, 0, 0, 10, "7.1u=u7.yu*u7.Zu=u7.x" }, 

{"MULI" , *29, 0, 0, 10, "7.1u=u7.yu*u7.Zu=u7.x" }, 

{"MULU" , *2a, 0, 0, 10, "7.1u=u7.#yu*u7.#Zu=u7.#x,urH=7.#a" }, 
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{"MULUI" , *29, 0, 0, 10, "7.1u=u"/.#yu*u"/.zu=u7.#x ,urH="/.#a" }, 
{"DIV" , *2a, 0, 0, 60, "7.1u=u7.yu/u7.Zu=u7.x,urR=7.a" }, 

{"DIVI" , *29, 0, 0, 60, "7.1u=u7.yu/u7.Zu=u7.x,urR=7.a" }, 

{"DIVU" , *2a, rD,0, 60, "7.1u=u7.#b7.0yu/u7.#Zu=u7.#x,urR=7.#a" }, 
{"DIVUI" , #29, rD,0, 60, "7.1u=u7.#b7.0yu/u7.Zu=u7.#x,urR=7.#a" }, 
{"ADD" , *2a, 0, 0, 1, "7.1u=u7.yu+u7.Zu=u7.x" }, 

{"ADDI" , *29, 0, 0, 1, "7.1u=u7.yu+u7.Zu=u7.x" }, 

{"ADDU" , *2a, 0,0, 1, "7.1u=u7.#yu+u7.#Zu=u7.#x" }, 
{"ADDUI",#29,0,0, 1, "7.1u=u7.#yu+u7.Zu=u7.#x"}, 

{"SUB" , *2a, 0, 0, 1, "7.1u=u7.yu-u7.Zu=u7.x" }, 

{"SUBI" , #29, 0, 0, 1, "7.1u=u7.yu-u7.Zu=u7.x" }, 

{"SUBU" , #2a, 0, 0, 1, "7.1u=u7.#yu-u7.#Zu=u7.#x" }, 
{"SUBUI",#29,0,0, 1, "7.1u=u7.#yu-u7.Zu=u7.#x"}, 
{"2ADDU",#2a,0,0, 1, "7.1u=u7.#yu«l+u7.#Zu=u7.#x" }, 
{"2ADDUI",#29,0,0, 1, "7.1u=u7.#yu«l+u7.Zu=u7.#x" }, 
{"4ADDU",#2a,0,0, 1, "7.1u=u7.#yu«2+u7.#Zu=u7.#x" }, 
{"4ADDUI",#29,0,0, 1, "7.1u=u7.#yu«2+u7.Zu=u7.#x" }, 

{"8ADDU" , #2a,0, 0, 1, "7.1u=u7.#yu«3+u7.#Zu=u7.#x" }, 
{"8ADDUI",#29,0,0, 1, "7.1u=u7.#yu«3+u7.Zu=u7.#x" }, 
{"16ADDU",#2a,0,0, 1, "7.1u=u7.#yu«4+u7.#Zu=u7.#x" }, 
{"16ADDUI" , #29, 0, 0, 1, "7.1u=u7.#yu«4+u7.Zu=u7.#x" }, 

{"CMP" , #2a, 0, 0, 1, "7.1u=u7.yucmpu7.zu=u7.x"}, 

{"CMPI" , #29, 0, 0, 1, "7.1u=u7.yucmpu7.zu=u7.x"}, 

{"CMPU" , #2a, 0, 0, 1, "7.1u=u7.#yucmpu7.#zu=u7.x" }, 
{"CMPUI",#29,0,0, 1, "7.1u=u7.#yucmpu7.zu=u7.x"}, 

{"NEC" , #26, 0, 0, 1, "7.1u=u7.yu-u7.Zu=u7.x" }, 

{"NEGI" , #25, 0, 0, 1, "7.1u=u7.yu-u7.Zu=u7.x" }, 

{"NEGU" , #26, 0, 0, 1, "7.1u=u7.yu-u7.#Zu=u7.#x"}, 

{"NEGUI" , #25, 0, 0, 1, "7.1u=u7.yu-u7.Zu=u7.#x" }, 

{"SL" , #2a, 0, 0, 1, "7.1u=u7.yu«u7.#Zu=u7.x" }, 

{"SLI" , #29, 0, 0, 1, "7.1u=u7.yu«u7.Zu=u7.x" }, 

{"SLU",#2a,0,0, 1, "7.1u=u7.#yu«u7.#Zu=u7.#x"}, 

{"SLUI" , #29, 0, 0, 1, "7.1u=u7.#yu«u7.Zu=u7.#x" }, 

{"SR" , #2a, 0, 0, 1, "7.1u=u7.yu»u7.#Zu=u7.x" }, 

{"SRI" , #29, 0, 0, 1, "7.1u=u7.yu»u7.Zu=u7.x" }, 

{"SRU",#2a,0,0, 1, "7.1u=u7.#yu»u7.#Zu=u7.#x"}, 

{"SRUI" , #29, 0, 0, 1, "7.1u=u7.#yu»u7.Zu=u7.#x" } 

This code is used in section 65. 



flags: unsigned char, §64. 
op: register mmix_opcode, 
§62. 



op_info = struct, §64. 

rD = l, §55. 



rE = 2, §55. 

trace.format : char *, §64. 



MMIX-SIM: THE MAIN LOOP 



366 



67. (Info for branch commands 67) = 

{"BN" , *50, 0, 0, 1, ""/.b<0?uy.ty.g" }, 

{"BNB",*50,0,0, 1, "y.b<0?u"/.ty.g"}, 

{"BZ" , *50, 0, 0, 1, ""/.b==0?uy.ty.g" }, 

{"BZB" , *50, 0, 0, 1, ""/.b==0?uy.ty.g" }, 

{"BP" , *50, 0, 0, 1, ""/.b>0?uy.ty.g" }, 

{"BPB",*50,0,0, 1, "y.b>0?u"/.ty.g"}, 

{"BOD" , *50, 0, 0, 1, "y.buodd?u"/.ty.g" }, 
{"BODB",*50,0,0, 1, "y.buodd?u"/.ty.g"}, 

{"BNN" , *50, 0, 0, 1, ""/.b>=0?uy.ty.g" }, 

{"BNNB",*5O,0,0, 1, "y.b>=0?uy.ty.g"}, 

{"BNZ",*50,0,0, 1, ""/.b!=0?uy.ty.g"}, 

{ "BNZB" , *50, 0, 0, 1, "y.b ! =o?uy.ty.g" }, 

{"BNP", *50, 0,0, 1, ""/.b<=0?uy.ty.g"}, 

{"BNPB",*5O,0,0, 1, "y.b<=0?uy.ty.g"}, 

{"BEV" , *50, 0, 0, 1, ""/.buevenZu’/.f/.g" }, 

{"BEVB" , *50, 0, 0, 1, "y.buevenZuy.ty.g" }, 

{"PBN" , *50, 0, 0, 1, "y.b<0?u"/.ty.g" }, 

{"PBNB",*5O,0,0, 1, "y.b<0?uy.t"/.g"}, 

{"PBZ",*50,0,0, 1, ""/.b==0?uy.ty.g"}, 

{"PBZB",*5o,o,o, 1, "y.b==o?uy.ty.g"}, 

{"PBP" , *50, 0, 0, 1, "y.b>o?u"/.ty.g" }, 

{"PBPB",*5O,0,0, 1, "y.b>0?uy.t"/.g"}, 

{"PBOD" , *50, 0, 0, 1, "y.buoddZu’/.ty.g" }, 
{"PBODB",*5O,0,0, 1, ""/.buoddZuy.ty.g"}, 
{"PBNN",*5O,0,0, 1, "y.b>=0?uy.ty.g"}, 

{ "PBNNB" , *50, 0,0,1, "y.b>=0?u"/.ty.g" }, 

{"PBNZ" , *50, 0, 0, 1, "y.b ! =o?uy.ty.g" }, 
{"PBNZB",*5O,0,0, 1, ""/.b!=0?uy.ty.g"}, 
{"PBNP",*5o,o,o, 1, "y.b<=o?uy.ty.g"}, 
{"PBNPB",*5O,0,0, 1, "y.b<=0?u"/.ty.g"}, 
{"PBEV",*5O,0,0, 1, "y.buevenZuy.ty.g"}, 

{ "PBEVB" , *50, 0,0,1, ""/.buevenZuy.ty.g" }, 
{"CSN",*3a,0,0, 1, "7.1u=u/.Y<0?u‘U-u7.hu=uW'}, 
{"CSNI", *39,0,0, 1, ''7.1u=u7.y<0?u7.z:u7.hu=u7.x"}, 
{"CSZ",*3a,0,0, 1, ""/.lu=uy.y==0?uy.z:uy.bu=uy.x"}, 
{"CSZI",*39,0,0, 1, "7.1u=u7.J==Q?u7,z:u7,hu=u7.x"}, 

{ "CSP" , *3a, 0, 0, 1, "Uu=ulj>0?u7.z : uy.bu=uy.x" }, 
{"CSPI",*39,0,0, 1, "y.lu=uy.y>0?uy.z:uy.bu=u"/.x"}, 
{"CSOD",*3a,0,0, 1, "y.lu=uy.yuodd?uy.z : uy.bu=u'/.x" }, 
{"CSODI",*39,0,0, 1, ""/.lu=uy.yuodd?uy.z:uy.bu=uy.x"}, 
{"CSNN",*3a,0,0, 1, "y.lu=uy.y>=0?u”/.z : uy.bu=u"/.x" }, 

{ "CSNNI" , *39, 0, 0, 1, "%lu=u7.y>=0?u7.z : u"/.bu=uy.x" }, 
{"CSNZ",*3a,0,0, 1, "7Au=u7.y\=0?u7.z:u7,hu=u7.x"}, 

{ "CSNZI" , *39, 0, 0, 1, ""/.lu=uy.y ! =0?u"/.z : u"/.bu=uy.x" }, 
{"CSNP",*3a,0,0, 1, "7au=u7.y<=Q?u7.z:u7.hu=u7.x"}, 
{"CSNPI",*39,0,0, 1, ""/.lu=uy.y<=0?uy.z:uy.bu=uy.x"}, 
{"CSEV",*3a,0,0, 1, "y.lu=uy.yueven?uy.z : u^huV/.x" }, 

{ "CSEVI" , *39, 0, 0, 1, ""/.lu=uy.yueven?uy.z : uy.bu=u"/.x" }, 
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{"ZSN",#2a,0,0, 1, "y.lu=u"/.y<0?uy.z:uOu=u"/.x"}, 
{"ZSNI" , *29, 0,0, 1, "y.lu=uy.y<0?uy.z:u0u=u”/.x"}, 
{"ZSZ",#2a,0,0, 1, "y.lu=uy.y==0?uy.z:u0u=u”/.x"}, 
{"ZSZI" , *29, 0,0, 1, "y.lu=uy.y==0?uy.z:uOu=uy.x"}, 
{"ZSP",#2a,0,0, 1, ""/.lu=uy.y>0?uy.z:uOu=uy.x"}, 
{"ZSPI" , *29, 0,0, 1, "y.lu=uy.y>0?u"/.z:uOu=uy.x"}, 
{"ZS0D",*2a, 0,0,1, "y.lu=uy.yuodd?uy.z : uOu=uy.x" }, 
{"ZSODI" , *29, 0, 0, 1, "y.lu=uy.yuOdd?uy.z:u0u=uy.x"}, 
{"ZSNN" , *2a, 0, 0, 1, "7,lu=u’/.y>=0?u7.z : uOu=u"/.x" }, 
{"ZSNNI",*29,0,0, 1, "y.lu=uy.y>=0?uy.z:u0u=uy.x"}, 
{"ZSNZ" , *2a, 0,0, 1, "y.lu=uy.y ! =0?u"/.z : uOu=uy.x" }, 

{ "ZSNZI" , *29, 0, 0, 1, "y.luV/.y ! =0?u’/.z : uOu=uy.x" }, 
{"ZSNP" , *2a, 0,0, 1, "y.lu=uy.y<=0?uy.z:uOu=uy.x"}, 
{"ZSNPI",*29,0,0, 1, "y.lu=uy.y<=0?uy.z:u0u=uy.x"}, 
{"ZSEV" , *2a, 0, 0, 1, "y.lu=u'/.yueven?uy.z : uOuV/.x" }, 
{"ZSEVI",*29,0,0, 1, "y.lu=uy.yueven?uy.z:uOu=u”/.x"} 
This code is used in section 65. 
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68. (Info for load/store commands 68) = 

{"LDB" , *2a, 0, 1, 1, "y.lu=uMl iUy+U’z.'] u=u7.x" }, 

{"LDBI",*29,0, 1, 1, "7.1u=uMl[7.#yy.?+]u=u7.x"}, 

{"LDBU" , #2a, 0, 1, 1, "7.1u=uMl [7.#y+7.#z] u=u7.#x" }, 

{"LDBUI" , *29, 0, 1, 1, "7.1u=uMl [7.#y7.?+] u=u7.#x" }, 
{"LDW",#2a,0, 1, 1, "7.1u=uH2[7.#y+y.#z]u=u7.x"}, 
{"LDWI",#29,0, 1, 1, "7.1u=uM2 [7.#y7.?+] u=u7.x" }, 

{ "LDWU" , *2a, 0, 1, 1, "7.1u=uM2 [7.#y+7.#z] u=u7.#x" }, 

{ "LDWUI" , *29, 0, 1, 1, "7.1u=uM2 [7.#y7.?+] u=u7.#x" }, 
{"LDT",#2a,0, 1, 1, "7.1u=uH4 [7.#y+y.#z] u=u7.x" }, 

{"LDTI" , *29, 0, 1, 1, "7.1u=uM4[7.#yy.?+]u=u7.x"}, 
{"LDTU",*2a,0, 1, 1, "7.1u=uM4 [7.#y+7.#z] u=u7.#x"}, 

{ "LDTUI" , *29, 0, 1, 1, "7.1u=uM4 [7.#y7.?+] u=u7.#x" }, 

{ "LDO" , *2a, 0, 1, 1, "7.1u=uM8 [7.#y+y.#z] u=u7.x" }, 
{"LDOI",*29,0, 1, 1, "7.1u=uM8 [7.#y7.?+] u=u7.x" }, 

{"LDOU" , *2a, 0, 1, 1, "7.1u=uM8 [7.#y+7.#z] u=u7.#x" }, 
{"LD0UI",*29,0, 1, 1, "7.1u=uM8[y.#y7.?+] u=u7.#x"}, 

{"LDSF" , *2a, 0, 1, 1, "7.1u=u(M4 [7.#y+y.#z] )u=u7. . x" }, 

{ "LDSFI" , *29, 0, 1, 1, "7.1u=u (M4 [7.#y7.?+] ) u=u7. . x" }, 

{ "LDHT" , *2a, 0, 1, 1, "7.1u=uM4 [7.#y+7.#z] «32u=u7.#x" }, 
{"LDHTI",*29,0, 1, 1, "7.1u=uM4[7.#y7.?+]«32u=u7.#x"}, 

{ "CSWAP" , *3a, 0, 2, 2, "7.1u=u [M8 [7.#y+7.#z] ==7.a] u=u7.x , u7.r " }, 
{"CSWAPI" , *39, 0, 2, 2, "7.1u=u [M8 [7.#y7.?+] ==7. a] u=u7.x , u7.r " }, 
{"LDUNC",*2a,0, 1, 1, "7.1u=uM8 [7.#y+y.#z] u=u7.#x" }, 

{ "LDUNCI" , *29, 0, 1, 1, "7.1u=uM8 [7.#y7.?+] u=u7.#x" }, 
{"LDVTS",*2a,0,0, 1, 

{"LDVTSI",*29,0,0, 1, " "}, 

{ "PRELD" , *0a, 0, 0, 1, " [7.#y+7.#Zu . . u7.#x] " }, 

{ "PRELDI" , *09, 0, 0, 1, " [7.#y7.?+u . . u7.#x] " }, 

{ "PREGO" , *0a, 0, 0, 1, " [7.#y+7.#Zu . . u7.#x] " }, 

{ "PREGOI" , *09, 0, 0, 1, " [7.#y7.?+u . . u7.#x] " }, 

{"GO", *2a, 0, 0, 3, "7.1u=u7.#x,u->u7.#y+7.#z"}, 

{ "GDI ", *29, 0, 0, 3, "7.1u=u7.#x , u->u7.#y7.?+ " } , 

{"STB",*la,0, 1, 1, "Ml[7.#y+7.#z]u=u7.b,uM8[7.#w]=7.#a"}, 
{"STBI" , *19, 0, 1, 1, "Ml [7.#y7.?+] u=u7.b,uM8 [7.#w] =7.#a" }, 
{"STBU",*la,0, 1, 1, "Ml[7.#y+y.#z]u=u7.#b,uM8 [7.#w] =7.#a"}, 
{"STBUI",*19,0, 1, 1, "Ml[7.#y7.?+]u=u7.#b,uM8[7.#w] =7.#a"}, 
{"STW",*la,0, 1, 1, "M2[y.#y+7.#z]u=u7.b,uM8[7.#w]=y.#a"}, 
{"STWI" , *19, 0, 1, 1, "M2 [7.#y7.?+] u=u7.b,uM8 [7.#w] =7.#a" }, 
{"STWU",*la,0, 1, 1, "M2 [7.#y+y.#z]u=u7.#b,uM8 [7.#w] =7.#a"}, 
{"STWUI",*19,0, 1, 1, "M2 [7.#y7.?+]u=u7.#b,uM8[7.#w] =7.#a" }, 
{"STT",*la,0, 1, 1, "M4[7.#y+7.#z]u=u7.b,uM8[7.#w]=7.#a"}, 
{"STTI" , *19, 0, 1, 1, "M4[7.#y7.?+] u=u7.b,uM8 [7.#w] =7.#a" }, 

{ "SITU",* la, 0, 1, 1, "M4[7.#y+y.#z]u=u7.#b,uM8 [7.#w] =7.#a"}, 
{"STTUI",*19,0, 1, 1, "M4[7.#y7.?+]u=u7.#b,uM8[7.#w] =7.#a"}, 
{"ST0",*la,0, 1, 1, "M8[7.#y+7.#z]u=u7.b"}, 

{"STOI" , *19, 0, 1, 1, "M8 [7.#y7.?+] u=u7.b" }, 

{"STOU" , *la, 0, 1, 1, "M8[7.#y+y.#z]u=u7.#b" }, 

{"ST0UI",*19,0, 1, 1, "M8 [7.#y7.?+]u=u7.#b"}, 
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{"STSF" ,*la, 0, 1, 1, (M4[7.#y+y.#z] •/.)□=□"/. .b,uM8[y.#w]="/.#a"}, 

{"STSFI",*19,0, 1, 1, "y.(M4[y.#yy.?+]y.)u=uy..b,uM8[y.#w]=y.#a"}, 

{ " STHT" , * la, 0, 1, 1 , "M4 ["/.#y+y.#z] u=uy.#b»32 , uM8 [’/.#»] =y.#a" } , 
{"STHTI",*19,0, 1, 1, "M4 [y.#yy.?+] u=uy.#b»32,uM8 [’/.#»] =y.#a"}, 
{"STCO" , *0a, 0, 1, 1, "M8 [y.#y+"/.#z] uV/.b" }, 

{"STCOI" , #09, 0, 1, 1, "M8 [y.#y"/.?+] uV/.b" }, 

{"STUNG", #la,0, 1, 1, "M8 [y.#y+y.#z] u=u"/.#b" }, 

{"STUNCI",#19,0, 1, 1, "M8 ["/.#yy.?+] u=u’/.#b" }, 

{ "SYNCD" , #0a, 0, 0, 1, " [7.#y+7.#Zu . ■ u‘/.#x] " }, 

{"SYNCDI",#O9,0,0, 1, " ["/.#yy.?+u. .u’/.#x] 

{ "PREST" , #0a, 0, 0, 1, " [y.#y+y.#Zu . . u7.#x] " }, 

{"PRESTI",#O9,0,0, 1, " ["/.#yy.?+u. .u7.#x] "}, 

{"SYNCID" , #0a, 0, 0, 1, " [7,#J+7,#Zu ■ . u"/.#x] " }, 

{"SYNCIDI" , #09, 0, 0, 1, " ["/.#yy.?+u. .uy.#x] " }, 

{"PUSHGO" , #aa, 0, 0, 3, "y.lrO=y.#b,urL=y.a,urJ=y.#x,u->uy.#y+y.#z" }, 
{"PUSHGOI" , #a9, 0, 0, 3, "y.lrO=y.#b,urL=y.a,urJ=y.#x,u->uy.#yy.?+" } 
This code is used in section 65. 
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69. (Info for logical and control commands 69) = 
{"OR",*2a,0,0, 1, ""/.lu=u/.#yulu"/.#Zu=u7.#x"}, 

{"ORI",*29,0,0, 1, ""/au=u/.#yuluy.Zu=u"/.#x"}, 
{"ORN",#2a,0,0, 1, ""/.lu=u/.#yu I ~u"/.#Zu=u"/.#x" }, 
{"ORNI" , *29, 0, 0, 1, "•/.lu=u"/.#yu I "u”/.Zu=u"/.#x" }, 
{"NOR",*2a,0,0, 1, ""/.lu=u/.#yu" I u"/.#Zu=u"/.#x" }, 
{"NORI", *29, 0,0,1, "7.1u=u"/.#yu~lu7.Zu=u"/.#x"}, 

{ "XOR" , *2a, 0, 0, 1, "7.1u=u7.#yu“u7.#Zu=u7.#x" }, 
{"XORI",*29,0,0, 1, "7.1u=u7.#yu‘u7.Zu=u7.#x"}, 
{"AND",*2a,0,0, 1, "7.1u=u7.#yu&u7.#Zu=u7.#x" }, 
{"ANDI",*29,0,0, 1, "7.1u=u7.#yuV/.Zu=u7.#x"}, 
{"ANDN",*2a,0,0, 1, "7.1u=u7.#yu\\u7.#Zu=u7.#x" }, 

{ "ANDNI" , *29, 0, 0, 1, "7.1u=u7.#yu\\u7.Zu=u7.#x" }, 

{ "NAND" , *2a, 0, 0, 1, "7.1u=u7.#yu~&u7.#Zu=u7.#x" }, 

{ "NANDI" , *29, 0, 0, 1, "7.1u=u7.#yu'&u7.Zu=u7.#x" }, 
{"NXOR",*2a,0,0, 1, "7.1u=u7.#yu~‘u7.#Zu=u7.#x" }, 
{"NXORI",*29,0,0, 1, "7.1u=u7.#yu"‘u7.Zu=u7.#x"}, 
{"BDIF",*2a,0,0, 1, "7.1u=u7.#yubdifu7.#Zu=u7.#x" }, 
{"BDIFI",*29,0,0, 1, "7.1u=u7.#yubdifu7.Zu=u7.#x"}, 
{"WDIF",*2a,0,0, 1, "7.1u=u7.#yuwdif u7.#Zu=u7.#x" }, 

{ "WDIFI" , *29, 0, 0, 1, "7.1u=u7.#yuwdif u7.Zu=u7.#x" }, 
{"TDIF",*2a,0,0, 1, "7.1u=u7.#yutdifu7.#Zu=u7.#x" }, 
{"TDIFI",*29,0,0, 1, "7.1u=u7.#yutdifu7.Zu=u7.#x"}, 
{"ODIF",*2a,0,0, 1, "7.1u=u7.#yuOdifu7.#Zu=u7.#x" }, 
{"ODIFI", *29,0,0, 1, "7.1u=u7.#yuOdifu7.Zu=u7.#x"}, 
{"MUX",*2a,rM,0, 1, "7.1u=u7.#b?u7.#y : u7.#Zu=u7.#x" }, 
{"MUXI",*29, rM,0, 1, "7.1u=u7.#b?u7.#y:u7.Zu=u7.#x"}, 
{"SADD",*2a,0,0, 1, "7.1u=unu(7.#y\\7.#z)u=u7.x" }, 
{"SADDI",*29,0,0, 1, "7.1u=unu(7.#y7.?\\)u=u7.x"}, 
{"MOR",*2a,0,0, 1, "7.1u=u7.#yumoru7.#Zu=u7.#x" }, 
{"MORI", *29, 0,0, 1, "7.1u=u7.#yumoru7.Zu=u7.#x"}, 

{ "MXOR" , *2a, 0, 0, 1, "7.1u=u7.#yumxoru7.#Zu=u7.#x" }, 

{ "MXORI" , *29, 0, 0, 1, "y.lu=u7.#yumxoru7.Zu=u7.#x" }, 
{"SETH" , *20, 0, 0, 1, "y.lu=u7.#z" }, 

{"SETMH",*2O,0,0, 1, "y.lu=u7.#z" }, 

{"SETML" , *20, 0, 0, 1, "7au=u7.#z" }, 

{"SETL" , *20, 0, 0, 1, "y.lu=u7.#z" }, 

{"INCH", *30, 0,0, 1, "7.1u=uy.#yu+u7.#Zu=uy.#x"}, 

{ "INCMH" , *30, 0, 0, 1, "y.lu=u7.#yu+u7.#Zu=u7.#x" }, 
{"INCML",*3O,0,0, 1, "yau=u7.#yu+u7.#Zu=u7.#x"}, 
{"INCL",*3O,0,0, 1, "y.lu=uy.#yu+u7.#Zu=uy.#x"}, 
{"0RH",*30,0,0, 1, "y.lu=uy.#yulu7.#Zu=u7.#x"}, 
{"ORMH",*3O,0,0, 1, "7.1u=uy.#yulu7.#Zu=uy.#x"}, 
{"ORML",*3O,0,0, 1, "7.1u=uy.#yulu7.#Zu=uy.#x"}, 

{ "ORL" , *30, 0, 0, 1, "yau=u7.#yu l u7.#Zu=u7.#x" }, 
{"ANDNH" , *30, 0, 0, 1, "‘au=u7.*7u\\uUzu=uUx" }, 
{"ANDNMH",*30,0,0, 1, "7.1u=uy.#yu\\u7.#Zu=u7.#x" }, 
{"ANDNML",*30,0,0, 1, "7.1u=uy.#yu\\u7.#Zu=u7.#x" }, 

{ "ANDNL" , *30, 0,0,1, ""au=u‘/.*7u\\uUzu=uUx" }, 
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{MjHp" #40 0 , 0 , 1 , "->u"/.#z"}, 

{" JMPB" , *40, 0, 0, 1, "->u’/.#z"}, 

{"PUSHJ" , *e0, 0, 0, 1, "•/.lrO=y.#b,urL="/.a,urJ=y.#x,u->u’/.#z" }, 

{"PUSHJB" , *eO, 0, 0, 1, "y.lrO=y.#b,urL=y.a,urJ=y.#x,u->uy.#z" }, 

{"GETA" , *60, 0, 0, 1, "y.lu=u’/.#z" }, 

{"GETAB",*6O,0,0, 1, "y.luV/.Sz" }, 

{"PUT", *02, 0,0,1, ""/.su=uy.r"}, 

{"PUT!" , *01, 0, 0, 1, "y.su=uy.r" }, 

{"POP" , *80, rJ, 0, 3, "y.lrL=y.a,urO=y.#b,u->uy.#yy.?+" }, 

{"RESUME" , *00, 0, 0, 5, "{y.#b}u->u’/.#z" }, 

{"SAVE", *20, 0,20, 1, "y.lu=u"/.#x"}, 

{"UNSAVE", *82, 0,20, 1, " "/.#z : ur G="/.x , u • • • ,urL="/.a"}, 

{"SYNC", *01, 0,0,1, ""}, 

{"SWYM",*00, 0,0, 1, ""}, 

{"GET" , *20, 0, 0, 1, "y.lu=uy.su=uy.#x" }, 

{"TRIP" , *0a, 255, 0, 5, "rW=y.#w,urX=y.#x,urY=y.#y ,urZ=y.#z,urB=y.#b, ug [255] =y,#a" } 
This code is used in section 65. 

70. (Convert relative address to absolute address 70 } = 

{ 

if {{op & *fe) = JMP) yz = inst & *ffffff ; 

if {op & 1) yz -= {op = JMPB ? *1000000 : *10000); 

y = insLptr; z = incr{loc, yz <^2)\ 

} 

This code is used in section 60. 

71. (Install operand fields 71 ) = 

if {resuming A rop 7 ^ RESUME_AGAIN) 

( Install special operands when resuming an interrupted operation 126 ) 
else { 

if (/&*10) (Set b from register X 74); 
if {info[op].third.operand) (Set b from special register 79 ); 
if (/&*1) z.l = 

else if (/ & * 2 ) ( Set z from register Z 72 ) 

else if (fop & *f0) = SETH) ( Set z as an immediate wyde 78 ); 

if (/&*4) y.l = yy, 

else if (/ & *8) ( Set y from register Y 73 ); 

} 

This code is used in section 60. 



b\ octa, §61. 

/: register tetra, §62. 
mcr: octa (), mmix-ARITH §6. 
info-, op.info [], §65. 
inst: tetra, §61. 
instjptr: octa, §61. 

JMP = =^f0, §54. 

JMPB = ^fl, §54. 

1 : tetra, §10. 



loc: octa, §61. 

op: register mmix.opcode, 

§62. 

RESUME. AGAIN =0, §125. 
resuming: bool, §61. 
rJ =4, §55. 
rM = 5, §55. 
rop: int, §61. 



SETH = ^e0, §54. 
third.operand : unsigned 
char, §64. 
y: octa, §61. 
yy: register int, §62. 
yz: register int, §62. 

2 :: octa, §61. 

2 . 2 : register int, §62. 
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72. There are 256 global registers, g[0] through g[255]; the first 32 of them are used 
for the special registers rA^ rB , etc. There are Iring-mask + 1 local registers, usually 
256 but the user can increase this to a larger power of 2 if desired. 

The current values of rL, rG, rO, and rS are kept in separate variables called L, G, 
O, and S for convenience. (In fact, O and S actually hold the values rO/8 and rS/8, 
modulo Iringsize.) 

{ Set 2 from register Z 72 ) = 

{ 

if {zz > G) z = g[zz]-, 

else if {zz < L) z = l[{0 + zz) & Iring.mask]; 

} 

This code is used in section 71. 

73. ( Set y from register Y 73 ) = 

{ 

if {yy >G) g[yy\; 

else if {yy < L) y = l[{0 + yy) & Iring.mask]- 

} 

This code is used in section 71. 

74. ( Set b from register X 74) = 

{ 

if {xx > G) b = g[xx]\ 

else if {xx < L) b — l[{0 + xx) Iringjmask]-, 

} 

This code is used in section 71. 

75. (Local registers 62 ) += 

register int G, L, O; /* accessible copies of key registers */ 

76. (Global variables 19 } += 

octa p[256]; /* global registers */ 

octa *1; /* local registers */ 

int Iring.size; /* the number of local registers (a power of 2) */ 

int Iringjmask \ /* one less than Iring.size */ 

int S\ /* congruent to rS S> 3 modulo Iring.size */ 

77. Several of the global registers have constant values, because of the way MMIX 
has been simplified in this simulator. 

Special register rN has a constant value identifying the time of compilation. (The 
macro ABSTIME is defined externally in the file abstime.h, which should have just 
been created by ABSTIME; ABSTIME is a trivial program that computes the value 
of the standard library function time {A). We assume that this number, which is the 
number of seconds in the “UNIX epoch,” is less than 2^^. Beware: Our assumption 
will fail in February of 2106.) 

^define VERSION 1 /* version of the MMIX architecture that we support */ 

T^tdeflne SUBVERSION 0 /* secondary byte of version number */ 

T^tdefine SUBSUBVERSION 1 /* further qualification to version number */ 



373 



MMIX-SIM: THE MAIN LOOP 



{ Initialize everything 14 ) += 
g[rK] = neg.one-, 

g[rN].h = (VERSION < 24) + (SUBVERSION < 16) + (SUBSUBVERSION < 8); 

g[rN].l = ABSTIME; /* see comment and warning above */ 

g[rT].h = ’^80000005; 

g[rTT].h = *80000006; 

g[rV].h = *369c2004; 

if {Iring.size < 256) Iring.size = 256; 

Iringjmask = Iring.size — 1; 
if {Iring.size & Iringjmask) 

panic ( "TheunumberuofulocaIuregistersuniustubeuaupoweruofu2" ); 

I = (octa *) calloc {Iring. size, sizeo f (octa.)); 

if {-^l) panic("Noui’oomuforutheulocaluregisters" ); 

cur. round — R0UND_NEAR; 

78. In operations like INCH, we want z to be the yz field, shifted left 48 bits. We 
also want y to be register X, which has previously been placed in b; then INCH can 
be simulated as if it were ADDU. 

( Set z as an immediate wyde 78 ) = 

{ 



switch {op 3) { 


case 0: 


z.h = yz 16; break; 


case 1: 


z.h = yz; break; 


case 2: 


z.l = yz ^ 16; break; 


case 3: 
} 
y 


z.l = yz; break; 



} 

This code is used in section 71. 

79. (Set b from special register 79 ) = 
b = g[info[op].third.operaTid]', 

This code is used in section 71. 



ABSTIME = macro, abstime .h. 
b: octa, §61. 

calloc: void *(), <stdlib.h>. 
cur^round : int , 

MMIX-ARITH §30. 
h: tetra, §10. 
info: op_info [], §65. 

1: tetra, §10. 

neg.one: octa, mmix-arith §4. 
octa = struct, §10. 



op: register mmix.opcode, 

§62. 

pamc= macro (), §14. 
rA = 21, §55. 
rB=0, §55. 
r/f = 15, §55. 
riV = 9, §55. 

R0UND_NEAR = 4, §100. 
rT = 13, §55. 
rTT = 14, §55. 



rV = lS, §55. 
third.operand: unsigned 
char, §64. 

time: time.t (), <time.h>. 
xx: register int, §62. 
y: octa, §61. 
yy: register int, §62. 
yz: register int, §62. 

2 :: octa, §61. 

zz: register int, §62. 
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80 . (Install register X as the destination, adjusting the register stack if necessary so) = 
if (xx > G) { 

sprintf ( Ihs , " $"/.d=g ["/,d] " ,xx,xx); 

X-ptr = &ig[xx]; 

} else { 

while {xx > L) (Increase rL si); 

sprintf ( Ihs , " $"/.d=l [’/.d] " , xx , {O + xx) Iringjmask ) ; 

X-ptr = &Z[(0 + xx) & Iringjmask]-, 

} 

This code is used in section 60. 

81 . (Increase rL 8i) = 

{ 

l[{0 + I/) & Iringjmask] = zero-octa-, 

L = g[rL].l = L + 1; 

if {{{S — O — L) Iringjmask) = 0) stack.store { ); 

} 

This code is used in section 80. 

82 . The stack.store routine advances the “gamma” pointer in the ring of local 
registers, by storing the oldest local register into memory location rS and advancing 
rS. 

^(tdefine test. store J}kpt{ll) if {{ll)^bkpt & write.bit) breakpoint = traeing — true 

{ Subroutines 12 ) += 

void stack.store ARCS ((void)); 
void stack.store ( ) 

{ 

register mem.tetra *ll = mem.find{g[rS])-, 
register int k = S Iring.mask; 

ll->tet = l[k].h-, test.store.bkpt{ll); 

{II + l)-tet = l[k].l; test.store.bkpt {II + 1); 
if {stack.tracing) { 
tracing = true-, 
if ( cur .line) show.line { ) ; 

pnnt/ ("uuuuuuuuuuuuuMS [#"/.08xy.08x] =1 [/.d] =#y.08xy.08x, urS+=8\n" , .h, 

g[rS].l, k, l]k].h, l[k].l); 

} 

g]rS] = incr{g[rS],8),S++; 

} 

83 . The stack.load routine is essentially the inverse of stack.store. 

^(tdefine test.load.bkpt {ll) if {{ll)-^bkpt & read.bit) breakpoint = tracing = true 

{ Subroutines 12 ) += 

void stack.load ARCS ((void)); 
void stack.load { ) 

{ 

register mem.tetra *11-, 
register int fc; 
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s — ,g[rS] = incr{g[rS],-8); 
ll = mem.find{g[rS])\ 
k = S &i Iringjmask ; 
l[k].h = ll^tet\ testJoadJ)kpt{ll)\ 
l[k].l = {ll + testJoad.bkpt{ll + 1); 

if {stack.tracing) { 
tracing = true\ 
if ( cur dine ) showJine{); 

prmi/("uuuuuuuuuuuuui’S-=8,ul [7.d] =M8 [#‘/.08x'/.08x] =#°/.08x°/.08x\n" , k,g[rS].h, 



g[rS].l,l[k].h,l[k].l)- 



} 



} 



ARCS = macro (), §11. 
bkpt: unsigned char, §16. 
breakpoint: bool, §61. 
curJine: int, §31. 



Ihs: char [], §139. 



Iring.mask: int, §76. 
mem.find: mem.tetra *(), 



i20. 



showJine: void (), §47. 
sprintf: int {), <stdio.h>. 
stack J.racing: bool, §61. 
tet: tetra, §16. 
tracing: bool, §61. 



g: octa [], §76. 



mem.tetra = struct, §16. 
O: register int, §75. 
printf: int (), <stdio.h>. 
read.bit = ma.cTO, §58. 



G: register int, §75. 
h: tetra, §10. 

incr: octa (), mmix-ARITH §6. 



write.bit = macro, §58. 
x.ptr: octa *, §61. 
xx: register int, §62. 
zero.octa: octa, 



true = 1, §9. 



1 : octa *, §76. 




L: register int, §75. 
1 : tetra, §10. 



MMIX-ARITH §4. 
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84. Simulating the instructions. The master switch branches in 256 directions, 
one for each MMIX instruction. 

Let’s start with ADD, since it is somehow the most typical case — not too easy, and 
not too hard. The task is to compute x = y + z, and to signal overflow if the sum is 
out of range. Overflow occurs if and only if y and z have the same sign but the sum 
has a different sign. 

Overflow is one of the eight arithmetic exceptions. We record such exceptions in 
a variable called exc, which is set to zero at the beginning of each cycle and used to 
update rA at the end. 

The main control routine has put the input operands into octabytes y and z. It 
has also made x.ptr point to the octabyte where the result should be placed. 

( Cases for individual MMIX instructions 84 ) = 

case ADD: case ADDI: x = w, /* w = oplus{y,z) */ 

if {({y.h (B z.h) Sz sign.bit) = 0 A ((y. A © x.h) & sign.bit) 7 ^ 0) exc |= V_BIT ; 
store.x: *xjptr = x\ break; 

See also sections 85, 86, 87, 88, 89, 90, 92, 93, 94, 95, 96, 97, 101, 102, 104, 106, 107, 108, and 124. 
This code is used in section 60. 

85. Other cases of signed and unsigned addition and subtraction are, of course, 
similar. Overflow occurs in the calculation x = y — z li and only if it occurs in the 
calculation y = x + z. 

{ Cases for individual MMIX instructions 84 ) += 

case SUB: case SUBI: case NEC: case NEGI: x = ominus{y, z)\ 

if {{{x.h © z.h) & sign.bit) = 0 A {{x.h © y.h) & signjyit) 7 ^ 0) exc \— V_BIT; 
goto store.x\ 

case ADDU: case ADDUI: case INCH: case INCMH: case INCML: case INCL: x = ur, 
goto store.x; 

case SUBU: case SUBUI: case NEGU: case NEGUI: x = ominus{y, z)\ goto storc-X-, 
case IIADDU: case IIADDUI: case IVADDU: case IVADDUI: case VIIIADDU: 
case VIIIADDUI: case XVIADDU: case XVIADDUI: 

X = oplus{shiftJeft{y, {{op & *f ) S> 1) — 3), z); goto store.x; 
case SETH: case SETMH: case SETML: case SETL: case GETA: case GETAB: x = z\ 
goto store.x\ 

86. Let’s get the simple bitwise operations out of the way too. 

( Cases for individual MMIX instructions 84 ) += 

case OR: case ORI: case DRH: case QRMH: case DRML: case ORL: x.h — y.h \ z.h; 
x.l = y.l I z.l; goto store.x; 

case ORN: case ORNI: x.h = y.h \ ^z.h; x.l — y.l \ ^z.l; goto storc-x; 
case NOR: case NORI: x.h = ^{y.h \ z.h); x.l = | z.l); goto store.x; 

case XOR: case XORI: x.h = y.h © z.h; x.l = y.l © z.l; goto storc-x; 
case AND: case ANDI: x.h = y.h &z z.h; x.l = y.l &z z.l; goto store^x; 
case ANDN: case ANDNI: case ANDNH: case ANDNMH: case ANDNML: case ANDNL: 
x.h — y.h Ez ^z.h; x.l = y.l Ez ^z.l; goto store.x; 
case HAND: case NANDI: x.h — ^{y.h Ez z.h); x.l = ^{y.l Ez z.l); goto storc-X; 
case NXOR: case NXDRI: x.h = ^{y.h ® z.h); x.l = ^{y.l (B z.l); goto store.x; 
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ADD = #20, §54. 

ADDI =#21, §54. 
ADDU = #22, §54. 
ADDUI =#23, §54. 
AND = #c8, §54. 

ANDI =#c9, §54. 
ANDN = #ca, §54. 
ANDNH = #ec, §54. 
ANDNI =#cb, §54. 
ANDNL = #ef, §54. 
ANDNMH = #ed, §54. 
ANDNML = #ee, §54. 
exc: int, §61. 

GETA = #f4, §54. 
GETAB = #f5, §54. 
h: tetra, §10. 
IIADDU = #28, §54. 
IIADDUI =#29, §54, 
INCH = #e4, §54. 
INCL = #e7, §54. 
INCMH = #e5, §54. 
INCML = #e6, §54. 
IVADDU = #2a, §54. 
IVADDUI =#2b, §54, 



1: tetra, §10. 

NAND = #cc, §54. 

NANDI =#cd, §54. 

NEG = #34, §54. 

NEGI = #35, §54. 

NEGU = #36, §54. 

NEGUI =#37, §54. 

NDR = #c4, §54. 

NDRI = #c5, §54. 

NX0R = #ce, §54. 

NXORI =#cf, §54. 
ominus: octa (), 

MMIX-ARITH §5. 
op: register mmix_opcode, 
§62. 

oplus: octa (), MMIX-ARITH §5. 
OR = #c0, §54. 

0RH = #e8, §54. 

ORI =#cl, §54. 

0RL = #eb, §54. 

0RMH = #e9, §54. 

ORML = #ea, §54. 

0RN = #c2, §54. 

ORNI = #c3, §54. 



SETH = #eO, §54. 

SETL = #e3, §54. 

SETMH = #el, §54. 
SETML = #e2, §54. 
shift Jeft: octa (), 
MMIX-ARITH §7. 
signj)it = macro, §15. 
SUB = #24, §54. 

SUBI =#25, §54. 

SUBU = #26, §54. 

SUBUI = #27, §54. 
V_BIT = macro, §57. 
VIIIADDU = #2c, §54. 
VIIIADDUI =#2d, §54. 
w: octa, §61. 
x: octa, §61. 
x.ptr: octa *, §61. 
X0R = #c6, §54. 

XORI =#c7, §54. 
XVIADDU = #2e, §54. 
XVI ADDUI =#2f, §54. 
y: octa, §61. 
z: octa, §61. 
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87. The less simple bit manipulations are almost equally simple, given the subrou- 
tines of MMIX-ARITH. The MUX operation has three inputs; in such cases the inputs 
appear in y, z, and b. 

^define shifUamt {z.h V z.l > 64 ? 64 : z.l) 

{ Cases for individual MMIX instructions 84 ) -|-= 
case SL: case SLI: x = shift Jeft{y, shifUamt)-, 
a = shiftjright(x, shifUamt ,0); 
if {a.h 7 ^ y.h V a.l 7 ^ y.l) exc \= V_BIT; 
goto storc-x; 

case SLU: case SLUI: x = shiftJeft{y, shift.amt); goto store.x; 
case SR: case SRI: case SRU: case SRUI: x = shift.right{y, shift.amt, op 
goto storc-x; 

case MUX: case MUXI: x.h = {y.h & b.h) \ {z.h & x.l = {y.l & b.l) \ {z.l & 

goto storc-x; 

case SADD: case SADDI: x.l = count Jiits {y.h z.h) -\- count J)its {y.l z.l)', goto store.x 

case MDR: case MORI: x= booLmult{y, z, false); goto store.x', 
case MXOR: case MXORI: x = booLmult{y, z, true)-, goto store.x', 

case BDIF: case BDIFI: x.h — byte^diff {y.h, z.h)-, x.l = byte.diff {y.l, z.l)', goto storc-x; 
case WDIF: case WDIFI: x.h — wyde.diff {y.h, z.h)', x.l = wyde.diff {y.l, z.l); goto storc-x; 
case TDIF: case TDIFI: if {y.h > z.h) x.h — y.h — z.h; 
tdifj'. if {y.l > z.l) x.l = y.l — z.l; goto storc-X; 
case ODIF: case ODIFI: if {y.h > z.h) x = ominus{y, z); 
else if {y.h = z.h) goto tdifJ; 
goto store.x; 

88. When an operation has two outputs, the primary output is placed in x and the 
auxiliary output is placed in a. 

{ Cases for individual MMIX instructions 84 ) -|-= 
case MUL: case MULI: x = signed.omult{y, z); 
test.overfiow '. if {overflow) ea;cl=V_BIT; 
goto store^x; 

case MULU: case MULUI: x = omult{y, z); a = g[rH\ = awx; goto store.x; 
case DIV: case DIVI: if {-'Z.l A -<z.h) aux = y, exc \= D_BIT, overflow = false; 
else X = signed.odiv{y, z); 
a = g[rR\ = aux; goto test.overflow; 
case DIVU: case DIVUI: x = odiv{b,y, z); a = g[rR] = aux; goto store-X; 

89. The floating point routines of MMIX-ARITH record exceptional events in a 
variable called exceptions. Here we simply merge those bits into the exc variable. The 
U_BIT is not exactly the same as “underflow,” but the true definition of underflow 
will be applied when exc is combined with rA. 

( Cases for individual MMIX instructions 84 ) -|-= 
case FADD: x = fplus{y, z); 
fin.float'. round.mode = cur.round; 
store Jx: exc [= exceptions; goto store^x; 

case FSUB: a = z; if {fcomp{a, zero-octa) 2) a.h ®= sign J>it; 

X = fplus{y, a); goto fin.float; 
case FMUL: x = fmult{y, z); goto fin.float; 
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case FDIV: x = f divide {y, z)\ goto fin.float-, 

case FREM: x = fremstep{y, z, 2500)', goto fin.float-, 

case FSQRT: x = froot{z,y.l)\ 

fin.unifloat: if {y.hy y.l > 4) goto illegalj,nst\ 

roundjmode = {y.l ? y.l : curjround)-, goto storc-fx; 
case FINT: x = fintegerize{z,y.l)\ goto fin jani float-, 
case FIX: x = flxit{z,y.l); goto fln.uni float-, 

case FIXU: x = flxit{z, y.l)-, exceptions &= ~W_BIT; goto fln.unifloat-, 

case FLOT: case FLOTI: case FLOTU: case FLOTUI: case SFLOT: case SFLOTI: 

case SFLOTU: case SFLOTUI: x = floatit {z , y .1 , op & *2, op & *4); goto fln^unifloat- 



a: octa, §61. 

aux-. octa, MMIX-ARITH §4. 
b: octa, §61. 

BDIF = #d0, §54. 

BDIFI =#dl, §54. 
booLmult: octa (), 
MMIX-ARITH §29. 
b-yte-diff : tetra (), 
MMIX-ARITH §27. 
counUbits: int (), 

MMIX-ARITH §26. 
cur.round: int, 

MMIX-ARITH §30. 

D_BIT = macro, §57. 

DIV = #lc, §54. 

DIVI =#ld, §54. 

DIVU = #le, §54. 

DIVUI =*lf, §54. 
exc: int, §61. 
exceptions : int , 

MMIX-ARITH §32. 

FADD = #04, §54. 
false = 0, §9. 

fcomp: int (), MMIX-ARITH §85. 
FDIV = #14, §54. 
f divide: octa (), 

MMIX-ARITH §44. 

FINT = #17, §54. 
fintegerize: octa (), 
MMIX-ARITH §86. 

FIX = #05, §54. 

fixit: octa (), MMIX-ARITH §88. 
FIXU = #07, §54. 
floatit: octa {), 

MMIX-ARITH §89. 

FLQT = #08, §54. 

FLDTI =#09, §54. 

FLQTU = #0a, §54. 

FLQTUI =#0b, §54. 

FMUL = #10, §54. 



fmult: octa (), 

MMIX-ARITH §41. 
fplus: octa (), 

MMIX-ARITH §46. 

FREM = #16, §54. 
fremstep: octa (), 

MMIX-ARITH §93. 
froot: octa (), 

MMIX-ARITH §91. 

FSQRT = #15, §54. 

FSUB = #06, §54. 
g: octa [], §76. 
h: tetra, §10. 
illegaLinst: label, §107. 

1: tetra, §10. 

MQR = #dc, §54. 

MORI = #dd, §54. 

MUL = #18, §54. 

MULI = #19, §54. 

MULU = #la, §54. 

MULUI = # lb, §54. 

MUX = #d8, §54. 

MUXI = #d9, §54. 

MX0R = #de, §54. 

MXORI =#df, §54. 

0DIF = #d6, §54. 

ODIFI =#d7, §54. 

odiv: octa (), MMIX-ARITH §13. 

ominus: octa (), 

MMIX-ARITH §5. 
omult: octa (), 

MMIX-ARITH §8. 
op: register mmix.opcode, 
§62. 

overflow: bool, 

MMIX-ARITH §4. 
rF = 3, §55. 
roundjmode: int, §61. 
rR = 6, §55. 



SADD = #da, §54. 

SADDI = #db, §54. 

SFLOT = #0c, §54. 
SFLOTI =#0d, §54. 
SFLOTU = #0e, §54. 
SFLOTUI =#0f, §54. 
shift Jeft: octa (), 
MMIX-ARITH §7. 
shiftjright: octa (), 
MMIX-ARITH §7. 
sigUjbit = macro, §15. 
signedjOdiv: octa (), 
MMIX-ARITH §24. 
signedjOmult: octa (), 
MMIX-ARITH §12. 

SL = #38, §54. 

SLI =#39, §54. 

SLU = #3a, §54. 

SLUI = #3b, §54. 

SR = #3c, §54. 

SRI =#3d, §54. 

SRU = #3e, §54. 

SRUI =#3f, §54. 
storCjX: label, §84. 

TDIF = #d4, §54. 

TDIFI = #d5, §54. 
true = 1, §9. 

U_BIT = macro, §57. 
V_BIT = macro, §57. 
W_BIT = macro, §57. 
WDIF = #d2, §54. 

WDIFI = #d3, §54. 
wydCjdiff: tetra (), 
MMIX-ARITH §28. 
x: octa, §61. 
y: octa, §61. 
z: octa, §61. 
zerOjOcta: octa, 
MMIX-ARITH §4. 
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90. We have now done all of the arithmetic operations except for the cases that 
compare two registers and yield a value of —1 or 0 or 1. 

T^deflne cmp.zero store.x /* a; is 0 by default */ 

{ Cases for individual MMIX instructions 84 ) += 

case CMP: case CMPI: if {{y.h sign.bit) > {z.h &i sign.bit)) goto cmp.neg; 

if ({y.h k, signJiit) < {z.h k sign.bit)) goto cmpjpos\ 
case CMPU: case CMPUI: if {y.h < z.h) goto cmp.neg-, 
if (y.h > z.h) goto cmp.pos\ 
if {y.l < z.l) goto cmp.neg; 
if {y.l = z.l) goto cmp.zero', 
cmpjpos'. x.l = 1; goto storc-X', 
cmp.neg: x = neg.one; goto store.x; 
case FCMPE: k = fepscomp {y, z, b, true)-, 
if (k) goto cmp.zero.or.invalid; 
case FCMP: k = fcomp{y, z); 

if {k < 0) goto cmp.neg; 
cmp.fin: if {k = 1) goto cmpjpos-, 
cmp.zero-or.invalid,-. if {k = 2) exc \— I_BIT; 
goto cmp^zero; 

case FUN: if {fcomp{y, z) = 2) goto cmpjpos-, else goto cmp.zero; 
case FEQL: if {fcomp{y, z) = 0) goto cmp.pos; else goto cmp.zero; 
case FEQLE: k = fepscomp{y, z, b, false); 
goto cmp.fin; 

case FUNE: if {fepscomp{y, z,b,true) = 2) goto cmpjpos-, else goto cmp.zero; 

91. We have now done all the register-register operations except for the conditional 
commands. Conditional commands and branch commands all make use of a simple 
subroutine that determines whether a given octabyte satisfies the condition of a given 
opcode. 

( Subroutines 12 } -|-= 

int register .truth ARGS((octa, mmix_opcode)); 
int register.truth{o, op) 
octa o; 

mmix.opcode op; 

{ register int b; 

switch ((op 1) & *3) { 

case 0: b — o.h^Sl; break; /* negative? */ 

case 1: b — {o.h = 0 A o.l = 0); break; /* zero? */ 

case 2-. b — {o.h < sign.bit A {o.hV o.l))-, break; /* positive? */ 

case 3: b — o.lk*l; break; /* odd? */ 

} 

if (op k *8) return 6 © 1; 
else return b; 

} 
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92. The b operand will be zero on the ZS operations; 
register X on the CS operations. 

( Cases for individual MMIX instructions 84 ) += 
case CSN: case CSNI; case CSZ: case CSZI: 
case CSP: case CSPI: case CSOD: case CSODI: 
case CSNN: case CSNNI: case CSNZ: case CSNZI: 
case CSNP: case CSNPI: case CSEV: case CSEVI: 
case ZSN: case ZSNI: case ZSZ: case ZSZI: 
case ZSP: case ZSPI: case ZSOD: case ZSODI: 
case ZSNN: case ZSNNI: case ZSNZ: case ZSNZI: 
case ZSNP: case ZSNPI: case ZSEV: case ZSEVI: 

X = register J,ruth{y, op) ? z \ b-, goto store jx\ 



ARCS = macro ( ), §11. 
b\ octa, §61. 

CMP = #30, §54. 

CMPI =#31, §54. 

CMPU = #32, §54. 
CMPUI =#33, §54. 
CSEV = #6e, §54. 
CSEVI =#6f, §54. 

CSN = #60, §54. 

CSNI =#61, §54. 

CSNN = #68, §54. 
CSNNI =#69, §54. 
CSNP = #6c, §54. 
CSNPI =#6d, §54. 
CSNZ = #6a, §54. 
CSNZI =#6b, §54. 
CSDD = #66, §54. 
CSDDI =#67, §54. 

CSP = #64, §54. 

CSPI =#65, §54. 

CSZ = #62, §54. 

CSZI =#63, §54. 



exc. int, §61. 
false = 0, §9. 

FCMP = #01, §54. 

FCMPE = #11, §54. 

fcomp: int ( ), MMIX-ARITH §85. 

fepscomp: int (), 

MMIX-ARITH §50. 

FEQL = #03, §54. 

FEQLE = #13, §54. 

FUN = #02, §54. 

FUNE = #12, §54. 
h: tetra, §10. 

I_BIT = macro, §57. 
k: register int, §62. 

1: tetra, §10. 

mmix_opcode = enum, §54. 
neg-one: octa, MMIX-ARITH §4. 
octa = struct, §10. 
op: register mmix_opcode, 
§62. 

sign.bit = macro, §15. 



it will be the contents of 



store^x: label, §84. 
true = 1, §9. 
x: octa, §61. 
y: octa, §61. 
z: octa, §61. 

ZSEV = #7e, §54. 
ZSEVI = #7f, §54. 
ZSN = #70, §54. 
ZSNI =#71, §54. 
ZSNN = #78, §54. 
ZSNNI = #79, §54. 
ZSNP = #7c, §54. 
ZSNPI = #7d, §54. 
ZSNZ = #7a, §54. 
ZSNZI = #7b, §54. 
ZSDD = #76, §54. 
ZSODI = #77, §54. 
ZSP = #74, §54. 
ZSPI =#75, §54. 
ZSZ = #72, §54. 
ZSZI =#73, §54. 
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93. Didn’t that feel good, when 32 opcodes reduced to a single case? We get to do 
it one more time. Happiness! 

( Cases for individual MMIX instructions 84 ) += 
case BN: case BNB: case BZ: case BZB: 
case BP: case BPB: case BOD: case BODB: 



case 


BNN: 


case BNNB: 


case 


BNZ: 


case 


BNZB: 


case 


BNP: 


case BNPB: 


case 


BEV: 


case 


BEVB: 


case 


PBN: 


case PBNB: 


case 


PBZ: 


case 


PBZB: 


case 


PBP: 


case PBPB: 


case 


PBOD 


case PBODB 



case PBNN: case PBNNB: case PBNZ: case PBNZB: 
case PBNP: case PBNPB: case PBEV: case PBEVB: 

x.l — register Aruth{b, op)\ 
if {x.l) { 

insLptr = z; 
good = {op > PBN); 

} else good = {op < PBN); 
if {good) good.guesses ++■, 
else { 

bad.guesses ++ , sclock.l += 2; /* penalty is 2v for bad guess */ 

if {g[rl].l < 2 A g\rl].l A g[rl\.h = 0) tracing — breakpoint — true-, 
g[rl] = mcr( 5 i[r/],- 2 ); 

} 

break; 

94. Memory operations are next on our agenda. The memory address, y -\- z, has 
already been placed in w. 

{ Cases for individual MMIX instructions 84 ) += 
case LDB: case LDBI: case LDBU: case LDBUI: 
i = 56; j = {w.l & *3) <C 3; 
goto finJd-, 

case LDW: case LDWI: case LDWU: case LDWUI: 

i = 48; j = {w.l & *2) <C 3; 
goto finJd-, 

case LDT: case LDTI: case LDTU: case LDTUI: 
i = 32; j = 0; goto finJd-, 
case LDHT: case LDHTI: i = j = 0; 
finJd: ll = mem.find{w); testJoadJ>kpt{ll)-, 
x.h = ll^tet-, 

X = shift jright {shif t Jeft {x, j),i, op & *2)-, 
checkjd: if {w.h&isignjyit) goto privileged Jnst-, 
goto storc-x; 

case EDO: case LDOI: case LDQU: case LDQUI: case LDUNC: case LDUNCI: w.l &= —8; 
ll — mem_find{w); 

testJoadJ)kpt{ll)-, testJoadJ)kpt{ll + 1); 
x.h — ll-tet-, x.l = {ll + l)->tet-, 
goto checkjd-, 

case LDSF: case LDSFI: ll = mem.find{w)-, testJoadJ>kpt{ll)-, 

X = load.sf {ll^tet); goto checkjd-. 
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b: octa, §61. 
bad.guesses: int, §139. 

BEV = #4e, §54. 

BEVB = #4f, §54. 

BN = #40, §54. 

BNB = #41, §54. 

BNN = #48, §54. 

BNNB = #49, §54. 

BNP = #4c, §54. 

BNPB = #4d, §54. 

BNZ = #4a, §54. 

BNZB = #4b, §54. 

BOD = #46, §54. 

BODE = #47, §54. 

BP = #44, §54. 

BPB = #45, §54. 
breakpoint: bool, §61. 

BZ = #42, §54. 

BZB = #43, §54. 

g: octa [], §76. 

good: bool, §61. 

good^guesses: int, §139. 

h: tetra, §10. 

i: register int, §62. 

incr: octa (), MMIX-ARITH §6. 

insLptr: octa, §61. 

j: register int, §62. 

1: tetra, §10. 

LDB = #80, §54. 

LDBI =#81, §54. 

LDBU = #82, §54. 



LDBUI =#83, §54. 

LDHT = #92, §54. 

LDHTI =#93, §54. 

LDO = #8c, §54. 

LDOI = #8d, §54. 

LD0U = #8e, §54. 

LDOUI =#8f, §54. 

LDSF = #90, §54. 

LDSFI =#91, §54. 

LDT = #88, §54. 

LDTI = #89, §54. 

LDTU = #8a, §54. 

LDTUI =#8b, §54. 

LDUNC = #96, §54. 

LDUNCI =#97, §54. 

LDW = #84, §54. 

LDWI = #85, §54. 

LDWU = #86, §54. 

LDWUI =#87, §54. 
ll: register mem_tetra *, 
§ 62 . 

load^sf : octa {), 
MMIX-ARITH §39. 
mem.find: mem.tetra *( ), 
§ 20 . 

op: register mmix_opcode, 

§ 62 . 

PBEV = #5e, §54. 

PBEVB =#5f, §54. 

PBN = #50, §54. 

PBNB = #51, §54. 



PBNN = #58, §54. 

PBNNB = #59, §54. 

PBNP = #5c, §54. 

PBNPB = #5d, §54. 

PBNZ = #5a, §54. 

PBNZB = #5b, §54. 

PBQD = #56, §54. 

PBQDB = #57, §54. 

PBP = #54, §54. 

PBPB = #55, §54. 

PBZ = #52, §54. 

PBZB = #53, §54. 
privileged.inst: label, §107. 
register .truth: int (), §91. 
rl = 12, §55. 
selock: octa, §19. 
shiftjeft: octa (), 
MMIX-ARITH §7. 
shiftjright: octa (), 
MMIX-ARITH §7. 
aign.bit = macro, §15. 
atore.x: label, §84. 
test Joad.bkpt = macro (), §83. 
tet: tetra, §16. 
tracing: bool, §61. 
true = 1, §9. 
w: octa, §61. 
x: octa, §61. 
y: octa, §61. 
z: octa, §61. 
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95. (Cases for individual MMIX instructions 84) += 
case STB: case STBI: case STBU: case STBUI: 

i = 56; j = {w.l & *3) 3; 

goto firi-pst-, 

case STW: case STWI: case STWU: case STWUI: 

i = 48; j = {w.l & *2) 3; 

goto fin.pst-, 

case STT: case STTI: case STTU: case STTUI: 

i = 32; j = 0; 
fin.pst: ll = mem.find(w): 
if ((op&#2) = 0) { 

a = shift. right {shift Jeft{b,i),i,0); 
if {a.h 7 ^ b.h V a.l 7 ^ b.l) exc |= V_BIT; 

} 

ll^tet ©= {Ihtet © {b.l <C (i — 32 — j))) & ((((tetra) —1) ^ (i — 32)) S> j); 
goto fin.st-, 

case STSF: case STSFI: ll = mem.find{w)\ 
ll^tet = store.sf {b)\ exc = exceptions; 
goto fin.st; 

case STHT: case STHTI: ll = mem.find{w); ll^tet = b.h; 
fin.st: test.store.bkpt{ll); 

w.l &= — 8 ; ll = mem.find{w); 

a.h = llriet; a.l = {ll + l)-tet; /* for trace output */ 
goto check.st; 

case STCD: case STCOI: b.l = xx; 

case STO: case STOI: case STDU: case STDUI: case STUNG: case STUNCI: w.l &= —8; 

ll — mem.find{w); 

test.store.bkpt {ll); test.storeJ>kpt{ll + 1); 

Ihtet = b.h; {ll + l)-tet = b.l; 
check.st: if {w.h sign.bit) goto privileged.inst; 

break; 

96. The CSWAP operation has elements of both loading and storing. We shuffle some 
of the operands around so that they will appear correctly in the trace output. 

( Cases for individual MMIX instructions 84 ) += 
case CSWAP: case CSWAPI: w.l &= —8; ll = mem.find{w); 
testJoad.bkpt{ll); testJoad.bkpt{ll + 1); 
a = g[rP]; 

if {ll^tet = a.h A {ll + l)-tet = a.l) { 
x.h = 0 , x.l = 1; 

test.store.bkpt{ll); test.store.bkpt {ll + 1); 
ll~>tet = b.h, {ll + l)~>tet — b.l; 
strcpy {rhs , "MS [°/,#w] =’/,#b" ) ; 

} else { 

b.h = ll-tet,b.l = {ll + l)-tet; 
g[rP] = b; 

strcpy ( rhs , " rP="/,#b " ) ; 

} 

goto check Jd; 
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97. The GET command is permissive, but PUT is restrictive. 

( Cases for individual MMIX instructions 84 ) += 
case GET: if {yy ^0\/ zz > 32) goto illegaLinst-, 

X = g[zz]; 
goto store-X', 

case PUT: case PUTI: if {yy ^OV xx > 32) goto illegaLinst; 
strcpy{rhs, "7.Zu=u°/.#z" ); 
if {xx > 8) { 

if {xx < 11 A xs 7 ^ 8) goto illegaLinst; /* can’t change rN, rO, rS */ 

if {xx < 18) goto privilegedAnst; 

if {xx = rA) (Get ready to update rA too) 

else if {xx = rL) { Set L = z = min(z, L) 98 ) 

else if {xx = rG) (Get ready to update rG 99); 

} 

g[xx\ = z; zz = XX ; break; 

98. ( Set L = z = min( 2 , L) 98 ) = 

{ 

X = z; strcpy {rhs , z .h ? "min(rL,°/,#x)u=u’/»z" : "min(rL,°/,x)u=u7z" ); 
if {z.l > L V z.h) z.h — 0, z.l = L; 
else old^L — L = z.l; 

} 

This code is used in section 97. 



a: octa, §61. 
b\ octa, §61. 
checked-, label, §94. 

CSWAP = #94, §54. 

CSWAPI =#95, §54. 
exc. int, §61. 
exceptions : int , 

MMIX-ARITH §32. 
g-. octa [], §76. 

GET = #fe, §54. 
h: tetra, §10. 
i: register int, §62. 
illegaLinst: label, §107. 
j: register int, §62. 

1: tetra, §10. 

L: register int, §75. 
ll: register mem_tetra *, 
§62. 

mem^find: mem.tetra *(), 
§ 20 . 

old^L: int, §61. 
op: register mmix_opcode, 
§62. 

privilegedAnst: label, §107. 
PUT = #f6, §54. 



PUTI = #f7, §54. 

M = 21, §55. 
rG = 19, §55. 
rhs = macro, §139. 
rL = 20, §55. 
rP = 23, §55. 
shiftjeft: octa (), 
MMIX-ARITH §7. 
shifUright: octa (), 
MMIX-ARITH §7. 
sign.bit = macro, §15. 
STB = #a0, §54. 

STBI = #al, §54. 

STBU = #a2, §54. 
STBUI =#a3, §54. 
STCD = #b4, §54. 
STCDI =#b5, §54. 
STHT = #b2, §54. 
STHTI =#b3, §54. 

STO = #ac, §54. 

STOI = #ad, §54. 
store^sf : tetra (), 
MMIX-ARITH §40. 
store-x: label, §84. 
ST0U = #ae, §54. 



STOUI = #af, §54. 

strcpy: char *(), <string.h>. 

STSF = #b0, §54. 

STSFI = #bl, §54. 

STT = #a8, §54. 

STTI =#a9, §54. 

STTU = #aa, §54. 

STTUI = #ab, §54. 

STUNC = #b6, §54. 

STUNCI =#b7, §54. 

STW = #a4, §54. 

STWI =#a5, §54. 

STWU = #a6, §54. 

STWUI = #a7, §54. 

test JoadAkpt = macro (), §83. 

tesLstore.bkpt = macro (), §82. 

tet: tetra, §16. 

tetra = unsigned int, §10. 

V_BIT = macro, §57. 

w: octa, §61. 

x: octa, §61. 

xx: register int, §62. 

yy: register int, §62. 

z: octa, §61. 

zz: register int, §62. 
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99. ( Get ready to update rG 99 ) = 

{ 

if {z.h ^ 0 V z.l > 255 V z.l < LV z.l < 32) goto illegaLinst-, 
for {j = z.l-, j < G; j++) g[j] = zero.octa-, 

G = z.l-, 

} 

This code is used in section 97. 

100. #define R0UND_0FF 1 
#define R0UND_UP 2 
#define R0UND_D0WN 3 
#define ROUND_NEAR 4 

( Get ready to update rA lOO ) = 

{ 

if {z.h / 0 V z.l > *40000) goto illegaLinst- 
cur.round = {z.l > *10000 ? z.l > 16 : RQUND_NEAR); 

} 

This code is used in section 97. 

101. Pushing and popping are rather delicate, because we want to trace them 
coherently. 

( Cases for individual MMIX instructions 84 ) += 
case PUSHGO: case PUSHGOI: instjptr = to; goto push; 
case PUSHJ: case PUSHJB: instjptr = z; 
push-, if {xx > G) { 

XX = Z/++; 

if {{{S — O — L) Iringjmask) = 0) stack.store { ); 

} 

x.l = XX ; l[{0 + xx) &L Iringjmask] = x; /* the “hole” records the amount pushed */ 
sprintf ( Ihs , " 1 [”/.d] ="/,d , u" , {O + xx) Iringjmask ,xx); 

X = g[rJ] = incr{loc,4); 

L —= XX + 1; O += XX + 1; 
b — g[rO] = incr{g[rO], {xx + 1) <C 3); 
synCjL-. a.l = g[rL].l = L; break; 

case POP: if {xx ^ 0 A xx < L) y = l[{0 + ra — 1) & Iringjmask]; 
if {g[rS].l = g[rO].l) stackjload{); 
k = l[{0 — 1) & lringjmask].l & *ff ; 
while ((tetra)(0 — S) < (tetra) k) stackjload{); 

L — k + {xx < L 7 XX : L + 1); 
if (L > G) L = G; 
if {L>k) { 

l[{0 — 1) & Iringjmask] = y; 

if {y.h) sprintf {Ihs, "1 [°/,d] =#’/.x"/,08x,u" , (O — 1) & lringjmask,y.h,y.l); 
else sprintf {Ihs, "1 ["/,d] =#°/,x,u" ,{0 — 1) & Iringjmask, y.l); 

} else Ihs [0] = ’ \0 ’ ; 

y — g]rJ]; z.l — yz <^2; insLptr = oplus{y,z); 

O —— k + 1; b — g[rO] = incr{g[rO], —{{k + 1) <C 3)); 
goto synCjL; 
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102. To complete our simulation of MMIX’s register stack, we need to implement 
SAVE and UNSAVE. 

( Cases for individual MMIX instructions 84 ) += 
case SAVE; if {xx < G V yy 0 V zz ^ 0) goto illegaLinst-, 
l[{0 + L) & lringjmask].l = L, L++; 
if {{{S — O — L) Iringjmask) = 0) stack.store { ); 

O += L; g[rO] = incr{g[rO], L <C 3); 

L = g[rL].l = Q- 

while {g[rO].l ^ g[rS].l) stack.store{); 
for (fe = G; ; ) { 

( Store qffcl in the register stack 103); 

if {k = 255) k = rB- 

else if (fc = rR) k = rP\ 

else if {k = rZ + 1) break; 

else fc++; 

} 

O = S,g[rO] = g[rS']; 

X = mcr{g[rO], —S)', goto store.x; 



a: octa, §61. 
b: octa, §61. 
curjround: int, 

MMIX-ARITH §30. 
g: octa [], §76. 

G: register int, §75. 
h: tetra, §10. 
illegaLinst: label, §107. 
incr: octa (), mmix-ARITH §6. 
instjptr: octa, §61. 
j: register int, §62. 
k: register int, §62. 

L: register int, §75. 

1: tetra, §10. 

1: octa *, §76. 

Ihs: char [], §139. 
loc: octa, §61. 



Iring.mask: int, §76. 

O: register int, §75. 

Oplus: octa (), MMIX-ARITH §5. 
POP = #fS, §54. 

PUSHGD = =^be, §54. 

PUSHGOI §54. 

PUSHJ =#f2, §54. 

PUSHJB = §54. 

rB=0, §55. 
rJ =4, §55. 
rL = 20, §55. 
rO = 10, §55. 
rP = 23, §55. 
rR = 0, §55. 
rS = ll, §55. 
rZ = 27, §55. 

S: int, §76. 



SAVE = #fa, §54. 

sprintf: int {), <stdio.h>. 

stackJoad: void (), §83. 

stack.store: void (), §82. 

store.x: label, §84. 

tetra = unsigned int, §10. 

w: octa, §61. 

x: octa, §61. 

xx: register int, §62. 

y: octa, §61. 

yy: register int, §62. 

yz: register int, §62. 

octa, §61. 
zero.octa: octa, 
MMIX-ARITH §4. 
zz: register int, §62. 



MMIX-SIM: SIMULATING THE INSTRUCTIONS 



388 



103. This part of the program naturally has a lot in common with the stack^store 
subroutine. (There’s a little white lie in the section name; if k is rZ + 1, we store rG 
and rA, not g[k].) 

(Store g[k] in the register stack 103) = 
ll — mem.find{g[rS]); 

if {k = rZ + 1) x.h = G 24, x.l = g\rA].l\ 
else X = g[k\, 

ll-’tet = x.h', tesLstore.bkpt (ll)-, 

{ll + l)^tet = x.l; tesGstoreJ}kpt{ll + 1); 
if {stack.tracing) { 
tracing = true; 
if ( curJine ) showJine ( ) ; 

if {k > 32) print/ ("uuuuuuuuuuuuuM8[#y.08x7.08x]=g[y.d]=#y.08xy.08x,urS+=8\n", 

g[rS].h, g[rS].l, k, x.h, x.l); 

else print/ ("uuuuuuuuuuuuuM8 [#y.08x"/.08x] =’/.s=#y.08xy.08x , urS+=8\n" , g[rS].h, 
g[rS].l, k = rZ + 1 1 "(rG,rA)" : speciaLname[k],x.h,x.l); 

} 

S++,g[rS'] = incr(p[rS'],8); 

This code is used in section 102. 

104. (Cases for individual MMIX instructions 84) += 
case UNSAVE: if {xx ^0\/ yy ^0) goto illegaLinst; 

z.l &= —8; g\rS] = incr{z,8); 
for {k = rZ + 1; ; ) { 

(Load g[k] from the register stack 105 ); 

if {k = rP) k = rR; 

else if {k = rB) fc = 255; 

else if {k = G) break; 

else k — ; 

} 

S = ff[r5].t >3; 
stack Joad ( ) ; 

k — t[S & lringjmask\.l &*ff; 

for (/ = 0; j < k; j++) stackJoad{); 

0 = S; g[rO] = (/[rS]; L = k > G 1 G ■. k; 
g[rL\.l = L; a = g[rL\; g[rG].l = G; break; 

105. (Load g[k\ from the register stack 105) = 
g[rS] = incr{g[rS],-8); 

ll = mem_find{g[rS]); 

test Joad J)kpt{ll); test Joad J)kpt {ll + 1); 

if {k = rZ + 1) { 

x.l = G — g[rG].l = ll-'tet S> 24, a.l = g[rA\.l = {ll + l)-*tet & ; 

if (G < 32) x.l = G = g[rG].l = 32; 

} else g[k].h = ll-*tet , g[k].l = {ll + l)-tet; 
if {stack Jracing) { 
tracing = true; 
if ( curJine ) showJine ( ) ; 

if {k > 32) print/("uuuuuuuuuuuuui'S-=8,ug[y.d]=M8[#y,08xy,08x]=#y.08xy.08x\n",fc, 
g[rS].h, g[rS].l, ll^tet, {ll + l)->tet); 
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else if {k=rZ + 1) prmi/ ( "uuuuuuuuuuuuu (rG, rA) =M8 [#"/, 08x7, 08x] =#’/.08x’/.08x\n" , 

g[rS].h,g[rS].l,ll-tet,{ll + 

else prini/ (" uuuuuuuuuuuuui’S-=8, u’/.s=M8 [#y,08x°/,08x] =#‘/.08x'/.08x\n" , 

speciaLname[k], g[rS].h, g[rS].l, ll^tet, {II + 

} 

This code is used in section 104. 

106. The cache maintenance instructions don’t affect this simulation, because there 
are no caches. But if the user has invoked them, we do provide a bit of information 
when tracing, indicating the scope of the instruction. 

( Cases for individual MMIX instructions 84 ) += 

case SYNCID: case SYNCIDI: case PREST: case PRESTI: case SYNCD: case SYNCDI: 
case PREGO: case PREGOI: case PRELD: casePRELDI: x = incr{w, xx)\ break; 

107. Several loose ends remain to be nailed down. 

( Cases for individual MMIX instructions 84 ) += 

case GO: case GDI: x = insUptr-, instjptr = w, goto store-x; 
case JMP: case JMPB: instjptr = z\ 
case SWYM: break; 

case SYNC: if {xx ^ 0 V yy ^ 0 V zz > 7) goto illegaLinst; 
if {zz < 3) break; 

case LDVTS: caseLDVTSI: privileged.inst: strcpy{lhs," [privileged"); 
goto breakjinst; 

illegaLinst: strcpy{lhs, " lillegal"); 
breakAnst: breakpoint = tracing = true; 

if {-'interacting A -'interacLafter.break) halted = true; 

break; 



a: octa, §61. 
breakpoint: bool, §61. 
curjine: int, §31. 

G: register int, §75. 
g: octa [], §76. 

GO = #9e, §54. 

GOI = #9f, §54. 

h: tetra, §10. 

halted: bool, §61. 

incr: octa (), MMIX-ARITH §6. 

instjptr: octa, §61. 

interact.after. break: bool, §61. 

interacting: bool, §61. 

j: register int, §62. 

jMP = #f0, §54. 

JMPB = #fl, §54. 
k: register int, §62. 

1: tetra, §10. 

1: octa *, §76. 

L: register int, §75. 

LDVTS = #98, §54. 

LDVTSI =#99, §54. 

Ihs: char [], §139. 

ll: register mem_tetra *, 



§62. 

Iring.mask: int, §76. 
mem.find: mem.tetra *( ), 
§ 20 . 

O: register int, §75. 

PREGO =#9c, §54. 

PREGOI =#9d, §54. 

PRELD = # 9a, §54. 

PRELDI =#9b, §54. 

PREST = #ba, §54. 

PRESTI = #bb, §54. 
printf: int (), <stdlo.h>. 

M = 21, §55. 
rB=0, §55. 
rG = 19, §55. 
rL = 20, §55. 
rO = 10, §55. 
rP = 23, §55. 
rR = 6, §55. 
rS = ll, §55. 
rZ =27, §55. 

S: int, §76. 

showjine: void (), §47. 
speciaLname: char *[], §56. 



stackjoad: void (), §83. 
stack.store: void {), §82. 
atackj.racing: bool, §61. 
store-x: label, §84. 
strcpy: char *(), <strlng;.h>. 
SWYM = #f d, §54. 

SYNC = #fc, §54. 

SYNCD = #b8, §54. 

SYNCDI =#b9, §54. 

SYNCID = #bc, §54. 

SYNCIDI = #bd, §54. 

test Joad^bkpt = macro (), §83. 

tesLstorc-bkpt = macro (), §82. 

tet: tetra, §16. 

tracing: bool, §61. 

true = 1, §9. 

UNSAVE = #fb, §54. 

w: octa, §61. 

x: octa, §61. 

xx: register int, §62. 

yy: register int, §62. 

z: octa, §61. 

zz: register int, §62. 
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108. Trips and traps. We have now implemented 253 of the 256 instructions: 
all but TRIP, TRAP, and RESUME. 

The TRIP instruction simply turns H_BIT on in the exc variable; this will trigger 
an interruption to location 0. 

The TRAP instruction is not simulated, except for the system calls mentioned in the 
introduction. 

( Cases for individual MMIX instructions 84 ) += 
case TRIP: exc |=H_BIT; break; 

case TRAP: if {xx yy > maxsys.call) goto privilegedAnst\ 

strcpy{rhs, trapJormat[yy]); 
g[rVFW] = insRptr-, 
q\rXX].h = siqn.bit, q\rXX].l = inst: 
q[rYY\=yJrZZ] = ., 
z.h — 0, z.l = zz\ 
a = incr{b, 8); 

(Prepare memory arguments ma — M[a] and mb = M[fo] if needed ill); 
switch (yy) { 

case Halt'. (Either halt or print warning 109); g[rBB] = g[255]; break; 
case Foperr. g[rBB] = mmix.fopen {{unsigned char) zz,mb,ma)\ break; 
case F close'. g[rBB\ = mmix-f close {{unsigned char) zz)', break; 
case Fread'. g[rBB\ = mmixjread {{unsigned char) zz,mb,ma)-, break; 
case Fgets'. g[rBB\ — mmixjgcts {{unsigned char) zz,mb,ma)', break; 
case Fgetws: g[rBB] = mmix.fgetws {{unsigned char) zz , mb, ma)-, break; 
case Fwrite: g[rBB] = mmia;_/uinte ((unsigned char) zz , mb ,ma)-, break; 
case Fputs: g[rBB] = mmixjputs {{unsigned char) zz,b); break; 
case Fputws'. g[rBB] — mmix-fputws {{unsigned char) zz,b)- break; 
case Fseck'. g[rBB] — mmixjscek {{unsigned char) zz,b)-, break; 
case Ftell: g[rBB] = mmix-ftell {{unsigned char) zz); break; 

} 

X = (?[255] = g[rBB]; break; 

109. (Either halt or print warning 109) = 
if {^zz) halted = breakpoint = true; 
else if {zz = 1) { 

if {loc.hy loc.l > *90) goto privilegedjinst; 
printJ.ripju)aming{loc.l ^ 4, incr{g\rW\, —4)); 

} else goto privilegedjinst; 

This code is used in section 108. 

110. (Global variables 19 ) += 

char arg.count[] = {1,3, 1, 3, 3, 3, 3, 2, 2, 2, 1}; 
char *trap.format[] = ("Halt (°/,z) " , 

" $255u=uFopen (•/. ! z , M8 ["/.#b] =7.#q , M8 ["/.#a] =7.p ) u=u"/.x " , 

"$255u=uFclose(7. ! z)u=u"/.x" , "$255u=uFreadC/. ! z,M8 [7.#b] =7.#q,M8 [7.#a] =7.p)u=u7.x" 
"$255u=uFgets (7. ! z ,M8 [7.#b] =7.#q,M8 [7.#a] =7.p) u=u7.x" , 

"$255u=uFgetws (7. ! z,M8 [7.#b] =7.#q,M8 [7.#a] =7.p)u=u7.x" , 

"$255u=uFwrite(7. ! z,M8 [7.#b] =7.#q,M8 [7.#a] =7.p)u=u7.x" , 

"$255u=uFputs (7. ! z , 7.#b)u=u7.x" , "$255u=uFputws (7. ! z , 7.#b) u=u7.x" , 

"$255u=uFseek (7. ! z , 7.b) u=u7.x" , " $255u=uFtell (7. ! z) u=u7.x" } ; 
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111 . (Prepare memory arguments ma = M[a] and mb = M[fo] if needed ill) = 
if (arg.count[yy] = 3) { 

ll = mem.find{b)-, testJoad.bkpt{U); testJoadJ>kpt(ll -\- 1)\ 
mh.h = ll^tet, mb. I = {II + l)^tet\ 

II = mem_find{a)-, testJoad_bkpt{ll); testJoad.bkpt{ll + 1); 
ma.h = ma.l — {II + 

} 

This code is used in section 108. 

112 . The input/output operations invoked by TRAPs are done by subroutines in an 
auxiliary program module called MMIX-IO. Here we need only declare those subrou- 
tines, and write three primitive interfaces on which they depend. 

113 . ( Global variables 19 } -h= 
extern void mmix.ioJnit ARGS((void)); 

extern octa mmixjopen ARGS ((unsigned char, octa, octa)); 
extern octa mmix.fclose ARGS ((unsigned char)); 
extern octa mmixjread ARGS ((unsigned char, octa, octa)); 
extern octa mmixjgets ARGS ((unsigned char, octa, octa)); 
extern octa mmix.fgetws ARGS ((unsigned char, octa, octa)); 
extern octa mmixjwrite ARGS ((unsigned char, octa, octa)); 
extern octa mmix.fputs ARGS ((unsigned char, octa)); 
extern octa mmixjputws ARGS ((unsigned char, octa)); 
extern octa mmixjseek ARGS ((unsigned char, octa)); 
extern octa mmix.ftell ARGS ((unsigned char)); 
extern void print.trip. warning ARGS ((int, octa)); 
extern void mmixjakestdin ARGS((FILE *)); 



a: octa, §61. 

ARGS = macro (), §11. 
b: octa, §61. 
breakpoint: bool, §61. 
exc: int, §61. 

Fclose = 2, §59. 

Fgets = 4, §59. 

Fgetws = 5, §59. 

FILE, <stdio.h>. 

Fopen = 1, §59. 

Fputs = 7, §59. 

Fputws = 8, §59. 

FYead = 3, §59. 

Fseek = 9, §59. 

Ftell = 10, §59. 

Fwrite = 6, §59. 
g: octa [], §76. 
h: tetra, §10. 

H_BIT = macro, §57. 

Halt=0, §59. 

halted: bool, §61. 

incr: octa (), mmix-ARITH §6. 

inst: tetra, §61. 

instjptr: octa, §61. 

1 : tetra, §10. 

ll: register mem_tetra *, 
§62. 



loc: octa, §61. 
ma: octa, §61. 
max.sy s.call = ma.cYO, §59. 
mb: octa, §61. 
mem.find: mem.tetra *(), 
§ 20 . 

mmixjake.stdin: void (), 
MMIX-IO §10. 
mmix.fclose: octa (), 
MMIX-IO §11. 
mmixjgets: octa (), 
MMIX-IO §14. 
mmixjgetws: octa (), 
MMIX-IO §16. 
mmixjopen: octa (), 
MMIX-IO §8. 
mmix.fputs: octa (), 
MMIX-IO §19. 
mmixjputws: octa (), 
MMIX-IO §20. 
mmix.fread: octa (), 
MMIX-IO §12. 
mmix.fseek: octa (), 
MMIX-IO §21. 
mmix.ftell: octa (), 
MMIX-IO §22. 
mmix.fwrite: octa (), 



MMIX-IO §18. 
mmixSoSnit: void (), 
MMIX-IO §7. 
octa = struct, §10. 
print.trip.waming: void (), 
MMIX-IO §23. 

privileged.inst: label, §107. 

rBB = 7, §55. 

rhs = macro, §139. 

rW =24:, §55. 

rWW =28, §55. 

rXX =29, §55. 

rYY = 30, §55. 

rZZ = 31, §55. 

sign.bit = macro, §15. 

strcpy: char *(), <string.h>. 

test Joad^bkpt = macro (), §83. 

tet: tetra, §16. 

TRAP = ^00, §54. 

TRIP = §54. 

true = 1, §9. 
x: octa, §61. 
xx: register int, §62. 
y: octa, §61. 
yy: register int, §62. 
octa, §61. 

zz: register int, §62. 
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114. The subroutine mmgetchars {buf , size, addr, stop) reads characters starting at 
address addr in the simulated memory and stores them in buf , continuing until size 
characters have been read or some other stopping criterion has been met. If stop < 0 
there is no other criterion; if stop = 0 a null character will also terminate the process; 
otherwise addr is even, and two consecutive null bytes starting at an even address will 
terminate the process. The number of bytes read and stored, exclusive of terminating 
nulls, is returned. 

( Subroutines 12 } += 

int mmgetchars ARCS ((char *, int, octa, int)); 
int mmgetchars {buf , size, addr , stop) 
char *buf; 
int size; 
octa addr; 
int stop; 

{ 

register char *p; 
register int m; 
register mem.tetra *11; 
register tetra x; 
octa a; 

for (p = buf , m — 0, a — addr; m < size; ) { 
ll — mem.find{a); testJoadJ>kpt{ll); 

X = ll^tet; 

if {{a.l & *3) V m > size — 4) (Read and store one byte; retnrn if done lls) 
else (Read and store up to four bytes; return if done lie) 

} 

return size; 

} 

115. (Read and store one byte; return if done lis) = 

{ 

*p = (a; (8 * {{'^a.l) & *3))) & *ff ; 

if (-i*p A stop > 0) { 

if {stop = 0) return m; 

if {{a.l Sz*l) A *{p — 1) = ’\0’ ) return m — 1; 

} 

P++, m++,a = incr{a, 1); 

} 

This code is used in section 114. 

116. (Read and store up to four bytes; return if done lie) = 

{ 

*p = X ^ 24; 

if (-i*pA {stop = 0 V {stop >QAx< *10000))) return m; 

*{p + 1) = (a; 16) & *ff ; 

if (-i*(p + 1) A stop = 0) return m + 1; 

*{p + 2) = (a; » 8) & *ff ; 

if {^*{p + 2) A {stop = 0 V {stop > 0 A {x & *ffff ) = 0))) return m + 2; 

*{p + 3) = X & *ff ; 
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if + 3) A stop = 0) return m + 3; 

p += 4, m += 4, a = incr{a, 4); 

} 

This code is used in section 114. 

117. The subroutine mmputchars {buf , size, addr) puts size characters into the sim- 
ulated memory starting at address addr. 

{ Subroutines 12 ) -|-= 

void mmputchars ARGS ((unsigned char *, int, octa)); 
void mmputchars ( buf , size , addr ) 
nnsigned char *buf; 
int size; 
octa addr; 

{ 

register unsigned char *p; 
register int m; 
register mem_tetra *11; 
octa a; 

for {p — buf ,m = 0, a = addr; m < size; ) { 
ll = mem.find{a); test.storeJ>kpt{ll); 

if ((a.l & *3) V m > size — 4) (Load and write one byte lis) 
else (Load and write four bytes 119); 

} 

} 

118. (Load and write one byte lls) = 

{ 

register int s = 8 * ((^a.l) & *3); 

U^tet ©= {{{ll^tet ^ s) (B *p) & *ff) s; 

P++ ,m++ ,a = mcr{a, 1); 

} 

This code is used in section 117. 

119. (Load and write four bytes 119 ) = 

{ 

ll^tet = {*p < 24) -I- (*(p -I- 1) < 16) -I- (*(p -I- 2) <C 8) -h *(p + 3); 
p += 4, m -|-= 4, a = incr{a, 4); 

} 

This code is used in section 117. 



ARGS = macro (), §11. 

incr\ octa (), mmix-ARITH §6. 

1: tetra, §10. 

mem.find: mem.tetra *(), 



§ 20 . 

mem.tetra = struct, §16. 
octa = struct, §10. 

testJoad^bkpt=mdiCTO (), §83. 



test.store.bkpt = ma.cro (), §82. 

tet\ tetra, §16. 

tetra = unsigned int, §10. 
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120 . When standard input is being read by the simulated program at the same 
time as it is being used for interaction, we try to keep the two uses separate by 
maintaining a private buffer for the simulated program’s Stdin. Online input is 
usually transmitted from the keyboard to a C program a line at a time; therefore 
an /gets operation works much better than fread when we prompt for new input. 
But there is a slight complication, because fgets might read a null character before 
coming to a newline character. We cannot deduce the number of characters read by 
fgets simply by looking at strlen{stdinJ)uf). 

{ Subroutines 12 ) += 

char stdin.chr ARGS((void)); 
char stdin.chr ( ) 

{ 

register char *p; 

while {stdin.buf.start = stdin Jmf. end) { 
if {interacting) { 

printf ( " Stdln>u " ) ; jflush{stdout ) ; 

} 

if {-! fgets {stdin Jmf stdin)) 

panic ("Enduofufilsuonust an darduinput ; uUseutheu“fu°ption,unotu<" ); 
stdinjmf^start = stdin.buf; 
for (p = stdin.buf; p < stdin.buf + 254 ; p++) 
if (*p = ’\n’) break; 
stdin.buf.end = p + 1 ; 

} 

return * stdin Jmf^start++\ 

} 

121 . (Global variables 19} += 

char stdinJjuf [ 256 ] ; / * standard input to the simulated program * / 

char * stdin J)uf start', /* current position in that buffer */ 

char * stdin Jmfsnd', /* current end of that buffer */ 

122 . Just after executing each instruction, we do the following. Underflow that is 
exact and not enabled is ignored. (This applies also to underflow that was triggered 
by RESUME_SET.) 

( Check for trip interrupt 122 ) = 

if {{exc & (U_BIT + X_BIT)) = U_BIT A -^{g[rA].l & U_BIT)) exc &= ~U_BIT; 
if {exc) { 

if {exc & tracing exceptions) tracing = true, 

j = exc & {g[rA\.l \ H_BIT); /* find all exceptions that have been enabled */ 
if {j) ( Initiate a trip interrupt 123); 
g\rA].l 1 = exc S> 8 ; 

} 

This code is used in section 60. 

123 . (Initiate a trip interrupt 123) = 

{ 

tripping = true, 

for (fc = 0 ; -.(j&H_BIT); j<^=l,k++) ; 
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exc &= k)-, /* trips taken are not logged as events */ 

g[rW^] = instjptr\ 

instjptr.h = 0, insRptr.l = fc ^ 4; 

g[rX].h = sign.bit, g[rX].l = inst; 

if ((op & *e0) = STB) g{rY\ = w, g[rZ] = fe; 

else g[rY] = y,g[rZ] = z; 

g[rB] = g[25% 

g[255] = g[rJ]-, 

if (op = TRIP) w = g[rW],x = g[rX],a = g[255]; 

} 

This code is used in section 122. 

124 . We are finally ready for the last case. 

( Cases for individual MMIX instructions 84 ) += 
case RESUME: if (xx W yy V zz) goto illegaLinst-, 
inst^ptr = 2 = p[rfF]; 
b = g[rX]-, 

if {^{b.h sign.bit)) ( Prepare to perform a ropcode 125); 

break; 



a: octa, §117. 

ARCS = macro (), §11. 

6: octa, §61. 

exc: int, §61. 

ffiush: int (), <stdio.h>. 

fgets: char *(), <stdio.h>. 

fread: size_t (), <stdio.h>. 

g: octa [], §76. 

h: tetra, §10. 

H_BIT = macro, §57. 
illegaLinst: label, §107. 
inst: tetra, §61. 
instjptr: octa, §61. 
interacting: bool, §61. 
j: register int, §62. 
k: register int, §62. 

1: tetra, §10. 



op: register mmix.opcode, 

§62. 

pamc= macro (), §14. 
printf: int (), <stdio.h>. 
rA = 21, §55. 
rB=0, §55. 

RESUME = 9, §54. 

RESUME_SET = 2, §125. 
rJ =A, §55. 
rH^=24, §55. 
rX =25, §55. 
rT = 26, §55. 
rZ = 27, §55. 
sign.bit = ma.cro, §15. 

STB = #a0, §54. 

stdin: FILE *, <stdio.h>. 



stdout: FILE *, <stdio.h>. 
strlen: size.t (), <string.h>. 
tracing: bool, §61. 
tracing. exceptions: int, §61. 
TRIP = ^ff, §54. 
tripping: bool, §61. 
true = 1, §9. 

U_BIT = macro, §57. 
w: octa, §61. 
x: register tetra, §114. 
X_BIT = macro, §57. 
xx: register int, §62. 
y: octa, §61. 
yy: register int, §62. 

.z: octa, §61. 

zz: register int, §62. 
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125. Here we check to see if the ropcode restrictions hold. If so, the ropcode will 
actually be obeyed on the next fetch phase. 

T^define RESUME_ AGAIN 0 /* repeat the command in rX as if in location rW — 4 */ 

^define RESUME_CONT 1 /* same, but substitute rY and rZ for operands */ 

^define RESUME_SET 2 /* set register $X to rZ */ 

{ Prepare to perform a ropcode 125 } = 

{ 

rop = b.h 24; /* the ropcode is the leading byte of rX */ 

switch (rop) { 

case RESUME_CONT: if ((1 < {b.l » 28)) & ’^8f30) goto illegaLinst-, 
case RESUME_SET: k = {b.l > 16) & *ff ; 

if {k > L A k < G) goto illegaLinst', 
case RESUME. AGAIN: if {{b.l > 24) = RESUME) goto illegaLinst', 
break; 

default: goto illegaLinst', 

} 

resuming = true', 

} 

This code is used in section 124. 

126. (Install special operands when resuming an interrupted operation 126) = 
if (rop = RESUME.SET) { 

op = ORI; 

y = g[rZ]', 

z = zero.octa-, 

exc = g[rX].h & *ff00; 

/ = Xjis-desLbit', 

} else { /* RESUME.CONT */ 

y = g[rY]', 
z = g[rZ\', 

} 

This code is used in section 71. 

127. We don’t want to count the UNSAVE that bootstraps the whole process. 

( Update the clocks 127 ) = 

if {sclock.l V sclock.h V -^resuming) { 

sclock.h += info[op].mems', /* clock goes up by 2®^ for each p */ 
sclock = incr{sclock, info [op]. oops)', /* clock goes up by 1 for each v */ 
if {{-i{loc.h & sigmbit) V {g[rU].h & *8000)) A 

((op & {g[rU].h > 16)) = {g[rll].h » 24))) { 
g[rU].l++', 

if {g[rU].l = 0) { p[rf7]./i++; if (p[r{7].h & *7f ff = 0) g[rU].h —= *8000', } 

} /* usage counter counts matched instructions simulated */ 

if {g[rl].l < info[op].oops A g[rl].l A g[rl].h = 0) tracing — breakpoint = true, 

g[rl\ = incr{g[rl], —info[op\.oops)', /* interval v timer counts down */ 

} 

This code is used in section 60. 
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128. Tracing. After an instruction has been executed, we often want to display 
its effect. This part of the program prints out a symbolic interpretation of what has 
just happened. 

( Trace the current instruction, if requested 128 } = 
if (tracing) { 

if (showing.source A curJine) showJine(); 

(Print the frequency count, the location, and the instruction 130 ); 

( Print a stream-of-consciousness description of the instruction I3l ); 
if (showing ^stats V breakpoint) show^stats (breakpoint)-, 
jusCtraced = true-, 

} else if (just.traced) { 

prmtfC'u \n"); 

just-traced = false; 

shownjine = —gap — 1; /* gap will not be filled */ 

} 

This code is used in section 60. 

129. ( Global variables 19 ) += 

bool showing.stats ; /* should traced instructions also show the statistics? */ 

hoo\ just.traced; /* was the previous instruction traced? */ 



b: octa, §61. 
bool = enum, §9. 
breakpoint: bool, §61. 
curJine: int, §31. 
exc: int, §61. 

/: register tetra, §62. 
false = 0, §9. 

G: register int, §75. 
g: octa [], §76. 
gap: int, §48. 
h: tetra, §10. 
illegaLinst: label, §107. 
incr: octa (), mmix-ARITH §6. 
info: op.info [], §65. 
k: register int, §62. 



L: register int, §75. 

1: tetra, §10. 
loc: octa, §61. 
mems: unsigned char, §64. 
oops: unsigned char, §64. 
op: register mmix.opcode, 
§62. 

ORI =^cl, §54. 

printf: int (), <stdio.h>. 

RESUME = 9, §54. 

resuming: bool, §61. 

rl = 12, §55. 

rop: int, §61. 

rU = 17, §55. 

=25, §55. 



ry =26, §55. 

^^ = 27, §55. 
sclock: octa, §19. 
showJine: void (), §47. 
show.stats: void (), §140. 
showing.source: bool, §48. 
shownjine: int, §48. 
sign.bit =m&CTO, §15. 
tracing: bool, §61. 
true = 1, §9. 

XJs^dest.bit = *20, §65. 

y: octa, §61. 

.z: octa, §61. 
zero.octa: octa, 
MMIX-ARITH §4. 
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130. (Print the frequency count, the location, and the instruction 130 ) = 
if (resuming A op ^ RESUME) { 

switch (rop) { 
case RESUME. AGAIN: 

pnnt/("uuuuuuuuuuu(°/.08xy,08x:u°/.08xu("/.s))u" , loc.h, loc.l, inst , info[op].name); 

break; 

case RESUME.CONT: prmt/("uuuuuuuuuuu(7«08x7«08x : u’/.04xrYrZu(7os) )□" , loc.h, loc.l, 
inst ^ 16, info[op]. name)', break; 

case RESUME.SET: print/ ("uuuuuuuuuuu(7.08x7.08x:u- .7.02x. . rZu(SET) ) □" , loc.h, 
loc.l, (inst 16) & *ff ); break; 

} 

} else { 

ll = mem^find(loc)', 

print/("7.10d.u7.08x7.08x:u7.08xu(7.s)u" , ll-freq, loc.h, loc.l, inst, info [op], name); 

} 

This code is used in section 128. 

131. This part of the simulator was inspired by ideas of E. H. Satterthwaite, 
Software — Practice and Experience 2 (1972), 197-217. Online debugging tools have 
improved significantly since Satterthwaite published his work, but good offline tools 
are still valuable; alas, today’s algebraic programming languages do not provide 
tracing facilities that come anywhere close to the level of quality that Satterthwaite 
was able to demonstrate for ALGOL in 1970. 

( Print a stream-of-consciousness description of the instruction I3l ) = 

if (Z/is[0] = ’ ! ’) prmt/("7.Suinstruction! \n" , Ihs + 1); /* privileged or illegal */ 

else { 

( Print changes to rL 132 ); 

if (z.l = 0A(op = ADDUI V op = ORI)) p = "7.1u=u7.yu=u7.#x" ; /* LDA, SET */ 

else p = info[op].traceJormat; 

for ( ; *p; p++) (Interpret character *p in the trace format 133); 
if (exc) print/)" ,urA=#7.05x" ,gr[rA].Z); 

if (tripping) tripping — false, printf (" ,u~'>u#°/t02x" , inst.ptr .1); 
printf ("\n"); 

} 

This code is used in section 128. 

132. Push, pop, and UNSAVE instructions display changes to rL and rO explicitly; 
otherwise the change is implicit, if L 7 ^ old-L. 

{ Print changes to rL 132 ) = 

if (L ^ old.L A ^(f &i push.pop.hit)) print/ ("rL=7.d,u" , L); 

This code is used in section 131. 



399 



MMIX-SIM: TRACING 



ADDUI = *23, §54. 
exc: int, §61. 

/: register tetra, §62. 
false = 0, §9. 
freq\ tetra, §16. 
g: octa [], §76. 
h: tetra, §10. 
info: op.info [], §65. 
inst: tetra, §61. 
instjptr: octa, §61. 
h tetra, §10. 

L: register int, §75. 



lhs\ char [], §139. 
ll: register mem.tetra *, 
§62. 

loc: octa, §61. 
mem.find: mem.tetra *(), 
§ 20 . 

name: char *, §64. 
old^L: int, §61. 
op: register mmix.opcode, 
§62. 

ORI = *cl, §54. 
p: register char *, §62. 



print/: int (), <stdio.h>. 
push.pop.hit = ^^80, §65. 
tA = 21, §55. 

RESUME = ^f9, §54. 
RESUME_AGAIN = 0, §125. 
RESUME_CDNT = 1, §125. 
RESUME_SET =2, §125. 
resuming: bool, §61. 
rop: int, §61. 
trace.format: char *, §64. 
tripping: bool, §61. 
z: octa, §61. 
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133. Each MMIX instruction has a trace format string, which defines its symbolic 
representation. For example, the string for ADD is "°/.lu=u”/«yu+u”/«Zu=u7«x"; if the 
iirstruction is, say, ADD $1,$2,$3 with $2 = 5 and $3 = 8, aird if the stack offset is 
100, the trace output will be "$1=1 [101] u=u5u+u8u=ul3" . 

Percent signs (7o) induce special format conventioirs, as follows: 

• 7«a, 7ob, 7«p, 7oq, 7.W, 7oX, 7,y, and 7 dZ stand for the numeric contents of octabytes a, 
b, ma, mb, w, x, y, and z, respectively; a “style” character may follow the percent 
sign in this case, as explained below. 

• 7.( and 7.) are brackets that indicate the mode of floating point rounding. If 
round.mode = R0UND_NEAR, R0UND_0FF, R0UND_UP, R0UND_D0WN, the corresponding 
brackets are ( and ) , [ and ] , ~ and ~ , _ and _ . Such brackets are placed around a 
floating point operator; for example, floating point addition is denoted by ‘ [+] ’ when 
the current rounding mode is rounding-off. 

• 7ol stands for the string Ihs, which usually represents the “left hand side” of the 
instruction just performed, formatted as a register number and its equivalent in the 
ring of local registers (e.g., ‘$1=1 [101]’) or as a register number and its equivalent 
in the array of global registers (e.g., ‘$255=g[255] ’). The POP instruction uses Ihs to 
indicate how the “hole” in the register stack was plugged. 

• 7.r means to switch to string rhs and continue formatting from there. This mecha- 
nism allows us to use variable formats for opcodes like TRAP that have several variants. 

• 7.t means to print either ‘Yes, ->loc’ (where loc is the location of the next 
instruction) or ‘No’, depending on the value of x. 

• 7og means to print ‘ (bad guess)’ if good is false. 

• 7«s stands for the name of special register g[zz]. 

• 7.? stands for omission of the following operator if z = 0. For example, the memory 
address of LDBI is described by ‘7.#y7.?+’; this means to treat the address as simply 
‘7.#y’ if z = 0, otherwise as ‘7.#y+7oz’. This case is used only when z is a relatively 
small number {z.h = 0). 

( Interpret character *p in the trace format 133 ) = 

{ 

if {*p ’7o’) fputc{*p, stdout); 

else { 

style = decimal', 
char.switch : 

switch (*-H-p) { 

( Cases for formatting characters 134 ) ; 

default: pnnt/("BUG ! ! " ); /* can’t happen */ 

} 

} 

} 

This code is used in section 131. 
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134. Octabytes are printed as decimal numbers unless a “style” character intervenes 
between the percent sign and the name of the octabyte: ‘#’ denotes hexadecimal 
notation, prefixed by #; ‘0’ denotes hexadecimal notation with no prefixed # and with 
leading zeros not suppressed; ‘ . ’ denotes floating decimal notation; and ‘ ! ’ means to 
use the names Stdin, StdOut, or StdErr if the value is 0, 1, or 2. 

( Cases for formatting characters 134 } = 
case : style = hex ; goto char. switch ; 
case ’O’: style = zhex; goto char. switch; 
case style = floating; goto char. switch; 

case ’!’: style = handle; goto char. switch; 

See also sections 136 and 138. 

This code is used in section 133. 



135. (Type declarations 9) += 
typedef enum { 

decimal , hex , zhex , floating , handle 

} fmt_style; 



136. 


( 


case 


’a’ 


case 


’b’ 


case 


’p’ 


case 


’q’ 


case 


’w’ 


case 


’x’ 


case 


’y’ 


case 


’z’ 



( Cases for formatting characters 134 ) += 
trace.print{a); break; 
trace.printib); break; 
trace.print{ma); break; 
trace.print{mb); break; 
trace.printiw); break; 
trace.printix); break; 
trace.print{y); break; 
trace.print{z); break; 



a: octa, §61. 
b: octa, §61. 
false = 0, §9. 

fputc: int (), <stdio.h>. 
g-. octa [], §76. 
good: bool, §61. 
h: tetra, §10. 

Ihs: char [], §139. 
ma: octa, §61. 



mb: octa, §61. 
p: register char *, §62. 
printf: int (), <stdlo.h>. 
rhs = macro, §139. 
RDUND_DDWN = 3, §100. 
round.mode: int, §61. 
RQUND_NEAR = 4, §100. 
RQUND_0FF = 1, §100. 
RDUND_UP = 2, §100. 



stdout: FILE *, <stdio.h>. 
style: fmt_style, §137. 
trace.print: void (), §137. 
w: octa, §61. 
x: octa, §61. 
y: octa, §61. 
z: octa, §61. 
zz: register int, §62. 
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137. (Subroutines 12 ) += 
fmt_style style', 

char *stream.name[] = {"Stdln", "StdOut", "StdErr"}; 
void trace-print ARGS((octa)); 
void trace jprint(o) 
octa o; 

{ 

switch (style) { 

case decimal: printjint(o)-, return; 
case hex: fputc(’’tC , stdout)-, print Jiex(o)-, return; 
case zhex: printf ("7,08x7,08x" ,o.h, o.l); return; 
case floating: print.float(o); return; 

case handle: if (o./i = 0 A 0 ./ < 3) printf (stream.name[o.l]); 
else print jint(o)', return; 

} 

} 

138. (Cases for formatting characters 134) += 

case fputc(left-paren[round.mode], stdout); break; 
case fputc(righCparen[round.mode], stdout)', break; 
case ’t’: if (x.l) print/ ("u'les ,u~>u#''), print.hex(inst.ptr); 

else print/ ("uNo"); break; 
case ’g’: if (-igood) print/ ("u(baduguess) " ); break; 
case ’s’: print/ (specialjname[zz\)', break; 
case ’?’: p++; if (z.l) print/ ("7>c7,d" , *p, z.l)-, break; 
case print/ (Ihs)-, break; 
case ’r’: p — switchable.string-, break; 

139. ^define rhs &cswitchablestring [1] 

( Global variables 19 ) += 

char le/t.paren [] = {0, /* denotes the rounding mode * / 
char righCparen [] = {0, /* denotes the rounding mode * / 

char switchablestring['i8]; /* holds rhs', position 0 is ignored */ 

/* switchablestring must be able to hold any trap./ormat */ 
char Ihs [32]', 

int good.guesses , bad.guesses ; 



/* branch prediction statistics */ 
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140. ( Subroutines 12 ) += 

void show.stats ARCS ((bool)); 
void show. stats {verbose) 
bool verbose-, 

{ 

octa o; 

pnnt/ ("uu°/oduinstruction°/,s , u°/«duniem°/,s ,u°/oduOop°/,s ; u°/«dugooduguess"/,s ,u7«dubad\n" , 
g[rll].l, g[rll].l = 1 ? "" : "s", 
sclock .h, sclock .h = 1 ? "" ; "s", 
sclock 1, sclock .1 = 17 "" ; "s", 

good.guesses , good.guesses = 1 ? "" : "es" , bad.guesses); 
if {-^verbose) return; 
o = halted ? incr {inst.ptr , —4) : inst.ptr-, 

pnnt/("uu(’/«Suatulocationu#’/.08x’/.08x)\n" , /la/ted ? "halted" : "now" , o./i, o.i); 

} 



ARCS = macro (), §11. 
bool = enum, §9. 
decimal =0, §135. 
floating =3, §135. 
fmt.style = enum, §135. 
fputc: int (), <stdio.h>. 
g: octa [], §76. 
good: bool, §61. 
h: tetra, §10. 
halted: bool, §61. 
handle = 4, §135. 



hex = 1, §135. 

incr: octa {), mmix-arith §6. 
inst.ptr: octa, §61. 

1: tetra, §10. 
octa = struct, §10. 
p: register char *, §62. 
print.float: void (), 
MMIX-ARITH §54. 
printJiex: void (), §12. 
printjint: void (), §15. 
printf: int (), <stdio.h>. 



round.mode: int, §61. 
rU = 17, §55. 
sclock: octa, §19. 
speciaLname: char *[], §56. 
stdout: FILE *, <stdio.h>. 
trap.format: char *[], §110. 
x: octa, §61. 

2 :: octa, §61. 
zhex = 2, §135. 

22 : register int, §62. 
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141. Running the program. Now we are ready to fit the pieces together into 
a working simulator. 

^include <stdio.h> 

^include <stdlib.h> 

^include <ctype.h> 

^include <string.h> 

^include <signal.h> 

^include "abstime.h" 

(Preprocessor macros ll) 

( Type declarations 9 } 

( Global variables 19 ) 

( Subroutines 12 ) 

int main{argc, argv) 
int argc, 
char *argv[]; 

{ 

( Local registers 62 ) ; 
mmtxJ,ojinit [ ); 

( Process the command line 142 ) ; 

( Initialize everything 14 ) ; 

(Load the command line arguments 163); 

(Get ready to UNSAVE the initial context 164); 
while (1) { 

if {interrupt A -ibreakpoint) breakpoint = interacting = true, interrupt = false-, 
else { 

breakpoint = false ; 

if {interacting) ( Interact with the user 149); 

} 

if {halted) break; 

do (Perform one instruction 6o) while {{-linterrupt A -ibreakpoint) V resuming); 
if {interact.after .break) interacting = true , interact.after. break = false; 

} 

end.simulation: if {profiling) ( Print all the frequency counts 53 ) ; 
if {interacting \/ profiling \/ showing.stats) show.stats {true); 
return g[255].l; /* provide rudimentary feedback for non- interactive runs */ 

} 

142. Here we process the command line options; when we finish, *cur.arg should 
be the name of the object file to be loaded and simulated. 

We assume that argv [0] is never null. (The author believes strongly that the wizards 
who decided to allow argc = 0 were mistaken when they defined the C89 standard; 
hence he has taken no pains to avoid system crashes when people try to invoke any 
of his programs with a null environment. Null invocations are contrary to the intent 
of C’s designers.) 

^(tdefine mmo.file.name *cur.arg 

{ Process the command line 142 ) = 
myself = argv [0] ; 



405 



MMIX-SIM: RUNNING THE PROGRAM 



for (cur.arg — argv + 1; *cur^arg A {*cur^arg)'ifS\ = cur.arg++) 
scan.option{*cur.arg + 1, true)', 

if (^^cur.arg) scan.option{''?" , true); /* exit with usage note */ 
argc —= cur.arg — argv; /* this is the argc of the user program */ 
This code is used in section 141. 



breakpoint: bool, §61. 
cur.arg: char §144. 
false = 0, §9. 
g: octa [], §76. 
halted: bool, §61. 
interact.afterjDreak: bool, §61. 



interacting: bool, §61. 
interrupt: bool, §144. 
1: tetra, §10. 
mmix.io.init: void (), 
MMIX-IO §7. 
myself: char *, §144. 



profiling: bool, §144. 
resuming: bool, §61. 
scan.option: void (), §143. 
show.stats: void (), §140. 
showing^stats: bool, §129. 
true = 1, §9. 
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143. Careful readers of the following subroutine will notice a little white bug: A 
tracing specification like 1 1000000000 or even tOOOOOOOOOO or even t !!!!!!!!! ! is 
silently converted to t4294967295. 

The -b and -c options are effective only on the command line, but they are harmless 
while interacting. 

( Subroutines 12 ) += 

void scan^option ARGS((char *,bool)); 
void scan.option{arg , usage) 

char *arg; /* command line argument (without the */ 
bool usage; /* should we exit with usage note if unrecognized? */ 

{ 

register int k; 
switch (*arg) { 

case ’t’: if {strlen{arg) > 10) trace-threshold = 

else if (sscanf {arg + 1, "7,6." , &strace-threshold) ^ 1) trace-threshold = 0; 

return; 

case ’e’: if (-i*(arg + 1)) tracing-exceptions =*ff; 

else if {sscanf {arg + 1, "7,'^" , &r,tracing -exceptions) 1) tracing -exceptions = 0; 

return; 

case ’r’: stack-tracing = true; return; 
case ’s’: showing stats = true; return; 
case ’1’: if (-■*(arg + 1)) gap = S; 

else if {sscanf {arg + 1, ""/.d",&gap) / 1) gap = 0; 
showing source = true; return; 
case ’L’: if (-i*(arg + 1)) profile-gap = 3; 

else if {sscanf {arg + 1, "7,d" , &cprofile-gap) ^ 1) profile-gap = 0; 
profUeshowingsource = true; 
case ’P’: profiling = true; return; 

case ’v’: trace-threshold = Iff; tracing-exceptions =* ft; 

stack-tracing = true; showingstats — true; 
gap — IQ, showing source = true; 

profile-gap = 10, profUeshowingsource = true , profiling = true; 

return; 

case ’q’: trace-threshold = tracing -exceptions = 0; 

stack-tracing = showingstats = showing source = false ; 
profiling = profUeshowingsource = false ; 

return; 

case ’i’: interacting = true; return; 
case ’!’: interact-after-break — true; return; 

case ’b’: if {sscanf {arg + 1, "7>d" , &cbufsize) ^ 1) bufsize = 0; return; 
case ’c’: if {sscanf {arg + 1, "7>d" , Sdringsize) ^ 1) Iringsize = 0; return; 
case ’f’: ( Open a file for simulated standard input 145); return; 
case ’D’: ( Open a file for dumping binary output 146); return; 
default: if {usage) { 

fprintf {stderr , "Usage : u°/oSu<options>uprogf ileucommanduline-args . . . \n" , 
myself); 

for (fc = 0; usage-help[k][0]; k++) fprintf {stderr , usage-help[k]); 
exit{—l); 
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} else for (fc = 0; usagejielp[k]\l\ ^ ’b’; k++) printf {usage.help[k]); 

return; 

} 



ARCS = macro (), §11. 
bool = enum, §9. 
buf^size: int, §40. 
exit: void (), <stdlib.h>. 

false = 0, §9. 

fprintf: int (), <stdio.h>. 
gap: int, §48. 

interact.afterjDreak: bool, §61. 
interacting: bool, §61. 



Iring.size: int, §76. 
myself: char *, §144. 
printf: int (), <stdio.h>. 
profile.gap: int, §48. 
profile.showing.source: bool, 
§48. 

profiling: bool, §144. 
showing. source: bool, §48. 
showing.stats: bool, §129. 



sscanf: int (), <stdio.h>. 
stack.tracing : bool, §61. 
stderr: FILE <stdio.h>. 
strlen: size.t (), <string.h> 
trace.threshold : tetra, §61. 
tracing.exceptions: int, §61. 
true = 1, §9. 

usageJielp: char *[], §144. 
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144. (Global variables 19 ) += 

char *myself; /* argv^], the name of this simulator */ 



char **cur^arg-, 
bool interrupt-, 
bool profiling-, 
FILE *fake.stdin-, 
FILE *durnp.file-. 



/* pointer to current place in the argument vector */ 

/* has the user interrupted the simulation recently? */ 

/* should we print the profile at the end? */ 

/* file substituted for the simulated Stdin */ 

/* file used for binary dumps */ 
char *usageJielp[] = { 

"uwithuthesGuoptions :u(<n>=decimalijmimber ,u<x>=hexunuinber) \n" , 
-t<n>utraceueachuinstructionutheuf irstunutimes\n" , 
-e<x>utraceueachuinstructionuwithuanuexceptionuinatchingux\n" , 
-ruuuutracGuhiddenudetailsuofutheuregisteruStackVn" , 
-Kn>ulistuSourceulinesuwhenutracing,uf illingugapsu<=un\n" , 
“SuuuushowuStatisticsuafterueachutraceduinstruction\n" , 
-Puuuuprintuauprof ileuwhenusimulationuends\n" , 
'-L<n>ulistuSourceulinesuwithutheuprof ile\n" , 

'-Vuuuubsuverbose : ushowualmostueverything\n" , 

'-quuuuhsuqniet : ushowuonlyutheusimulateduStandarduOutput\n" , 
"iyuourunuinteractivelyuCpromptuf oruonlineuCommands) \n" , 
"luuuuinteract ,ubutuonlyuafterutheuprogramuhalts\n" , 
-b<n>uchangeutheubuf f erusizeuf oruSourcGulinesXn" , 
'-c<n>uchangGuthGucycliculocalurGgistGruringusizG\n" , 

'-f <f ilGnamG>uUSGugivGnuf ileutOusimulatGuStandarduinput\n" , 

'-D<f ilGnamG>udumpuauf ilsuf oruUSGubyuOthGrusimulatorsVn" , 

char *interactive-help[] = { 

"ThGuintGractivGuCommandsuarG : \n" , 

"<rGturn>uutracGuonGuinstruction\n" , 
"uuuuuuuuuutracGuonGuinstructionXn" , 
"cuuuuuuuuuContinuGuuntiluhaltuorubrGakpointXn" , 
"quuuuuuuuuqnituthGusimulationXn" , 
"suuuuuuuuushowuCurrGntuStatisticsXn" , 

"l<nXt>uuuSGtuand/orushowulocalurGgistGruinuf ormatutXn" , 
"g<nXt>uuuSGtuand/orushowuglobalurGgistGruinuf ormatutXn" , 
"rA<t>uuuuuSGtuand/orushowurGgistGrurAuinuf ormatutXn" , 
"$<n><t>uuuSGtuand/orushowudynainicurGgistGruinuf ormatutXn" , 
"M<xXt>uuuSGtuand/orushowumGmoryuOctabytGuinuf ormatutXn" , 
"+<nXt>uuuSGtuand/orushowunuadditionaluOctabytGSuinuf ormatutXn" 
"u<t>uisu ! u (dGcimal) u°ru • u (floating) u°ru#u (hGx) u°ruX"u (string) Xn 
"uuuuu°ru<empty>u (pr Gviousu<t>) uoru=<valuG>u (changGuvaluG) Xn" , 

" @<x>uuuuuugOut°uloc ationuxXn" , 

"b [rwx] <x>uSGtuorurGSGtLibrGakpointuatLilocationuxXn" , 
"t<x>uuuuuutracGulocationuxXn" , 

"u<x>uuuuuuuntracGulocationuxXn" , 

"TuuuuuuuuuSGtuCurrGntuSGgmGntutOLiTGxt_SGgmGntXn" , 
"DuuuuuuuuuSGtuCurrGntuSGgmGntutOuData_SGgmGntXn" , 
"PuuuuuuuuuSGtuCurrGntuSGgmGntutOuPool_SGgmGntXn" , 
"SuuuuuuuuuSGtuCurrGntuSGgmGntutOuStack_SGgmGntXn" , 
"BuuuuuuuuushowualluCurrGntubrGakpointSuandutracGpointsXn" , 
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"i<f ile>uuuinsertuCommandsufromuf ile\n" , 
"-<option>uchaiigeuautracing/listing/prof ileuoption\n" , 
"-'i’uuuuuuuushowutheutracing/listing/prof ileuoptionsuuXn" , 

145. (Open a file for simulated standard input 145 } = 
if (fake.stdin) fclose{fakestdin); 

fake^stdin = fopen{arg + 1, "r"); 

if {-ifakestdin) fprintf {stderr , "Sorry ,ulucan’tuopenufileu°/os ! \n" , arg + 1); 
else mmixjakestdin {fakestdin ) ; 

This code is used in section 143. 

146. ( Open a file for dumping binary output 146 } = 
dump.file = fopen{arg + 1, "wb"); 

if {-^dump.file) /prmf/(sfderr, "Sorry .ulucan’tyopenuf ileu°/oS ! \n" , org + 1); 
This code is used in section 143. 

147. (Initialize everything 14 } += 

signal (SIGINT, catchint)-, /* now catchint will catch the first interrupt */ 

148. ( Subroutines 12 } += 
void catchint ARGS((int)); 
void catchint (n) 

int n; 

{ 

interrupt = true\ 

signal [51G1KI , catchint)-, /* now cafc/iint will catch the next interrupt */ 

} 



arg: char §143. 

ARCS = macro (), §11. 
argv: char *[], §141. 
bool = enum, §9. 
fclose: int (), <stdio.h>. 



FILE, <stdio.h>. 
fopen: FILE +(), <stdio.h>. 
fprintf: int (), <stdio.h>. 
mmix.fake.stdin: void (), 
MMIX-IO §10. 



SIGINT = macro, <signal.h>. 
signal: void {*())(), 
<signal .h>. 

stderr: FILE <stdio.h>. 
true = 1, §9. 
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149. ( Interact with the user 149 ) = 

{ register int repeating; 

internet ■. (Put a new command in commandJjuf 150 ); 
p = command.buf ; 
repeating — 0 ; 
switch {*p) { 

case ’\n’ ; case ’n’ : breakpoint = tracing = true; /* trace one inst and break */ 

case ’c’: goto resumesimulation; /* continue until breakpoint */ 

case ’q’: goto end.simulation; 

case ’s’: show. stats {true); goto interact; 

case ’ - ’ : k = strlen (p); if (p[fc — 1 ] = ’ \n ’ ) p[fc — 1 ] = ’ \0 ’ ; 

scan.option{p + 1, false); goto interact; 

(Cases that change cur.disp.mode 152 }; 

(Cases that define cur.disp.type 153 ); 

( Cases that set and clear tracing and breakpoints I 6 I ); 
default: what.say. k = strlen (command.buf); 

if (fc < 10 A command J)uf [k — 1] = ’\n’) command.buf [k — 1] = ’\0’ ; 
else strcpy {command J)uf + 9, " . . . 

printf { "Eh?uSorry , uludon ’ tuunderstandu ‘ ’/.s ’ . □ (Typeuhuf oruhelp) \n" , 
command J)uf); 
goto interact; 

case ’h’: for {k = 0; interactive.help[k][0]; k++) printf {interactwejielp[k\); 
goto interact; 

} 

check.syntax: if {*p ^ ’\n’) { 
if {-'*p) 

incomplete.str : print/ ("Syntaxusrror : ulncompleteucommand! \n" ); 
else { 

p[strlen{p) — 1] = ’\ 0 ’ ; 

pnnt/("Syntaxuerror;uI’niuignoringu‘’/.s’ ! \n" ,p); 

} 

} 

while {repeating) (Display and/or set the value of the current octabyte 156); 
goto internet; 
resume.simulation: ; 

} 

This code is used in section 141. 

150. (Put a new command in command.buf 150 ) = 

{ register bool ready = false ; 

incLread- while {incLfile A -'ready) 

if {-! f gets {command J)uf , command J)uf. size, incl.file)) { 
f close {incLfile); 
incLfile = A; 

} else if {command.buf [0] 7 ^ ’\n’ A command.buf [0] 7 ^ ’i’ A command.buf [0] 7 ^ 
if {command.buf [0] = ’u’) printf {"’/,s" , command J>uf); 
else ready = true; 
while {-iready) { 

printf {"mmix>u"); fflush{stdout); 
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if (-ifgets{commandJ)uf,commandJ)uf^size,stdin)) commandJ>uf'fi\ = ’q’; 
if [command J)uf 7 ^ ’i’) ready = trite; 

else { 

commandJ)uf[strlen [command Jmf) — 1] = ’\ 0 ’; 
incLfile = fopen[commandJ>uf + 1, "r"); 
if [incLfile) goto incLread; 

if [isspace[command_buf[l])) incLfile = fopen[command_buf + 2, "r")-, 
if [incLfile) goto incLread-, 

print/("Can’tuopenufileu‘’/.s ’ !\n", command.buf + 1); 

} 

} 

} 

This code is used in section 149. 



151. 9^define command.bufsize 1024 

/* make it plenty long, for floating point tests */ 

{ Global variables 19 ) += 

char command.buf [command J)uf. size ] ; 

FILE *incLfile-, /* file of commands included by ‘i’ */ 
char cur.dispjmode = ’l’; /* ’l’or’g’or’$’or’M’ */ 

char cur^dispHype = ’ ! ’ ; /* ’ ! ’ or ’ . ’ or ’ # ’ or ’ " ’ */ 

bool cur.disp-set-, /* was the last <t> of the form =<val>? */ 
octa cur.disp-addr; j* the h half is relevant only in mode ’M’ */ 
octa curtsey, /* current segment offset */ 

char spec-reg.code[] = {r^l, rB , rC , rD , rE , rF , rG, rH , rl , rj, rK , rL, rM , rN , rO, rP, 
rQ, rR, rS , rT, rU , rV , rW , rX , rY , rZ}-, 

char spec-regg.code[] = {0, rBB ,0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, rTT , 0, 0, rWW , 
rXX,rYY,rZZ}-, 



bool = enum, §9. 

breakpoint: bool, §61. 
end^simulation: label, §141. 
false = 0, §9. 

fclose: int (), <stdio.h>. 
fflush: int (), <stdlo.h>. 
fgets: char *(), <stdio.h>. 
FILE, <stdio.h>. 
fopen: FILE *(), <stdio.h>. 
h: tetra, §10. 
interactive^help : char *[], 
§144. 

isspace: int (), <ctype.h>. 
k: register int, §62. 
octa = struct, §10. 
p: register char *, §62. 
printf: int (), <stdio.h>. 
r^ = 21, §55. 
rB = 0, §55. 



rBB = 7, §55. 
rC = 8, §55. 
rO = l, §55. 
rE = 2, §55. 
rF = 22, §55. 
rG = 19, §55. 
rH = 3, §55. 
r/ = 12, §55. 
rJ =4, §55. 
rK = 15, §55. 
rL = 20, §55. 
rM = 5, §55. 
rN = 9, §55. 
rO = 10, §55. 
rP = 23, §55. 
rQ = 16, §55. 
rR = 6, §55. 
rS = ll, §55. 
rT = 13, §55. 



r-TT = 14, §55. 

rU = 17, §55. 

rV = 13, §55. 

rlF =24, §55. 

rWW =23, §55. 

rX = 25, §55. 

rXX = 29, §55. 

rY =26, §55. 

rYY =30, §55. 

rZ = 27, §55. 

rZZ = 31, §55. 

scan_option: void (), §143. 

show.stata: void (), §140. 

stdin: FILE *, <stdio.h>. 

stdout: FILE *, <stdio.h>. 

strcpy: char *(), <strlng.h>. 

strlen: size_t (), <string.h>. 

tracing: bool, §61. 

true = 1, §9. 
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152 . (Cases that change cur^dispjmode 152) = 

case ’1’: case ’g’: case cur. disp -mode = *p++; 

for {cur-disp.addr .1 = 0 ; isdigit{*p); p++) 

cur-disp.addr .1 = 10 * cur-disp-uddr .1 + *p — ’0’ ; 
goto new-mode; 

case ’r’: p++; cur-disp-mode = ’g’; 
if (*p < ’A’ V *p > ’Z’) goto what-say; 

if (*(p+ 1) 7 ^ *p) cur-disp.addr .1 = spec-reg-Code\*p — ’A’],p++; 
else if {spec-regg-Code[*p — ’A’]) cur-disp.addr .1 = spec-regg-Code[*p — ’k’],p += 2; 
else goto what-say, 
goto new-mode; 
case ’M’: cur-disp.mode = *p\ 

cur-disp.addr = scari-hexijp + 1, curseg); cur-disp.addr .1 &= —8; p = next-char\ 
new-mode: cuv-dispset = false-, /* the “=’ is remembered only by ‘+’ */ 
repeating = 1; 
goto scan-type-, 

case ’ + if {^isdigit{*{p + 1))) repeating — 1 ; 

for (p++; isdigit(*p)-, p++) repeating = 10 * repeating + *p — ’0’ ; 
if {repeating) { 

if {cur-disp-mode = ’M’) cur-disp-addr = incr{cur-disp-addr,8); 
else cur-disp-addr .1++; 

} 

goto scan-type-. 

This code is used in section 149. 

153 . (Cases that define cur-disp-type 153) = 

case case case case cur.dispset = false-, 

repeating = 1; 

set-type-, cur-disp-type = *p++-, break; 

scan-type : if {*p = ’ ! ’ V *p = ’ . ’ V *p = V *p = ’ " ’ ) goto set-type ; 
if {*p ^ ’ = ’) break; 
goto scan.eql-, 
case ’ = ’ : repeating — 1 ; 
scan-eqT. cur-dispset = true-, 
val = zero-octa; 

if (*++p = ’#’) cur-disp-type =*p,val = scan-hex{p + 1, zero-octa)-, 
else if {*p = ’ " ’ V *p = ’\’ ’ ) goto scanstring-, 
else cur-disp-type = (scan-Const{p) >0? : ’!’); 

p — next-char-, 
if {*p ^ break; 

val.h = 0; val.l &= *ff; 
scanstring : cur-disp-type — ’ " ’ ; 

( Scan a string constant 155 ); break; 

This code is used in section 149. 

154 . (Subroutines 12) += 

octa scan-hex ARGS((char *,octa)); 
octa scan-hex {s, offset) 
char *s; 
octa offset; 
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{ 

register char *p; 
octa o; 

o = zero-octa; 

for (p = s; isxdigit{*p)\ p++) { 
o = mcr{shiftJeft{o,4:),*p — ’O’); 
if (*p > ’a’ ) o = incr{o, ’0’ — ’a’ + 10 ); 
else if {*p > ’ A ’ ) o = incr (o, ’O’ — ’A’ +10); 

} 

next.char = p; 
return oplus{o, offset); 

} 

155. (Scan a string constant 155) = 
while (*p = ’ , ’ ) { 
if (*++p = ’#’ ) { 

art® = scanjiex{p + 1 , zero_octa),p = next.char; 
val = incr {shift Jeff {val, 8), aux.l & *ff); 

} else if {is digit {*p)) { 

for (fc = *p+l- — ’ 0 ’ ; isdigit{*p); p++) k — (10 * k + *p— ’ 0 ’ ) & *ff; 
val = incr {shift Jeft {val, 8), k); 

} 

else if {*p = ’\n’) goto incompletestr; 

} 

if {*p = ’ \ ’ ’ A *(p + 2) = *p) *p = *{p + 2) = ’ " ’ ; 
if {*p = ’ " ’ ) { 

for (p++; *p A *p ^ ’\n’ A *p 7 ^ p++) val = incr{shiftJeft{val,8),*p); 

if {*p A *p++ = ’ " ’ ) 

if {*p = ’ , ’ ) goto scan.string; 

} 

This code is used in section 153. 



ARCS = macro (), §11. 
aux: octa, MMIX-ARITH §4. 
cur.disp.addr \ octa, §151. 
cur.disp.mode: char, §151. 
cur^disp.set: bool, §151. 
cur.disp.type: char, §151. 
cur^seg: octa, §151. 
false = 0, §9. 
h: tetra, §10. 
incomplete.str : label, §149. 
mcr: octa (), mmix-ARITH §6. 



isdigit: int (), <ctype.h>. 
isxdigit: int (), <ctype.h>. 
k: register int, §62. 

1: tetra, § 10 . 
next^char: char *, 
MMIX-ARITH §69. 
octa = struct, § 10 . 
oplus: octa (), MMIX-ARITH §5. 
p] register char *, §62. 
repeating: register int, §149. 
scan^const: int (), 



MMIX-ARITH §68. 
shift Jeft: octa (), 
MMIX-ARITH §7. 
spec^reg.code: char [], §151. 
spec.regg.code: char [], §151. 
true = 1, §9. 

val: octa, MMIX-ARITH §69. 
what.say: label, §149. 
zero.octa: octa, 

MMIX-ARITH §4. 
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156. (Display and/or set the value of the current octabyte 156) = 

{ 

if (cur.disp.set) ( Set the current octab3de to naZ 157); 

(Display the current octabyte 159); 
fputc{’\n’ , stdout); 
repeating — ; 
if {-^repeating) break; 

if [cur.dispjmode = ’M’) cur.disp.addr = incr{cur.disp.addr,8); 
else cur.disp.addr d++; 

} 

This code is used in section 149. 

157. (Set the current octabyte to val 157) = 
switch (cur.disp-mode) { 

case ’1’: l[cur.disp.addr d Iring.mask] = vaT, break; 
case k = cur.disp-addr.l 

if {k < L) l[(0 + fc) & Iringjmask] = val; else if {k > G) g[k] = val; 

break; 

case ’g’ : k = cur.disp.addr .1 & *ff ; 

if (k < 32) (Set g[k\ = val only if permissible 158 ); 
g[k] = val; break; 

case ’M’: if {-'{cur.disp.addr .h sign.bit)) { 

II — mem.find{cur.disp-addr); 
ll-tet = val.h; {II + l)-^tet = val.l; 

} break; 

} 

This code is used in section 156. 

158. Here we essentially simulate a PUT command, but we simply break if the PUT 
is illegal or privileged. 

(Set q\k] = val only if permissible 158) = 
if (A: > 9Afc / rJ) { 
if {k < 19) break; 

{k = r A) { 

if {val.h yf 0 V val.l > *40000) break; 

cur.round = {val.l > *10000 ? val.l » 16 ; R0UND_NEAR); 

} else if {k = rG) { 

if {val.h yf 0 V val.l > 255 V val.l < L V val.l < 32) break; 
for {j = val.l; j < G; j++) g[j] = zero-octa; 

G = val.l; 

} else if {k = rL) { 

if {val.h = 0 A val.l < L) L — val.l; 
else break; 

} 

} 

This code is used in section 157. 
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159. (Display the current octabyte 159) = 
switch {cur.disp-mode) { 
case ’1’ : k = cur^disp-addr d & Iringjmask-, 
print/ ("1 [°/,d] =", fc); aux = l\k\\ break; 
case k = cur^disp.addr.l 

if (fc < L) 

printf {"$7,d=l [’/.d] =" , fc, (O + fc) & Iringjmask), aux = t[(0 + fc) & Iring.mask]- 
else if (fc > G) pnnt/("$°/,d=g[‘/,d]=" , fc, fc), aux = g[k]-, 
else printf {"$7,d=" ,k), aux = zero.octa; 

break; 

case ’g’ : k = cur.disp-addr .1 &i *±f; 

print/ ("g[°/,d] =", fc); aux = g[k]; break; 
case ’M’: if (cur.disp-addr .h signMt) aux = zero -Octa\ 
else { 

ll = mem.find{cur.disp_addr); 
aux.h = llMet', auxd = {ll + l)-tet; 

} 

pnnt/ ("M8 [#" ); print Jiex {cur ^disp-addr)\ printf {"']='')■, break; 

} 

switch {cur. disp -type) { 

case printjint{aux)', break; 

case print-float {aux)', break; 

case fputc{’#’ , stdout); print-hex {aux)-, break; 

case print-string {aux)-, break; 

} 

This code is used in section 156. 



aux\ octa, mmix-ARITH §4. 
cur.disp.addr \ octa, §151. 
cur^disp.mode: char, §151. 
cur^disp.set: bool, §151. 
cur.disp.type: char, §151. 
curjTound: int, 

MMIX-ARITH §30. 
fputc. int (), <stdio.h>. 
g: octa [], §76. 

G: register int, §75. 
h: tetra, §10. 

incr\ octa (), mmix-ARITH §6. 
j] register int, §62. 
k: register int, §62. 



1: tetra, §10. 

1: octa +, §76. 

L: register int, §75. 

ll: register mem.tetra *, 

§62. 

Iring.mask: int, §76. 
mem.find: mem.tetra *(), 
§ 20 . 

O: register int, §75. 

print.float: void (), 
MMIX-ARITH §54. 
printJiex: void (), §12. 
printjint: void (), §15. 
prinGstring: void {), §160. 



printf: int (), <stdio.h>. 
rA = 21, §55. 

repeating: register int, §149. 
rG = 19, §55. 
rl = 12, §55. 
rL = 20, §55. 

R0UND_NEAR = 4, §100. 
sign.bit = macro, §15. 
stdout: FILE <stdio.h>. 
tet: tetra, §16. 
val: octa, MMIX-ARITH §69. 
zero.octa: octa, 

MMIX-ARITH §4. 
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160 . (Subroutines 12 ) += 
void print.string ARGS((octa)); 
void print.string (o) 

octa o; 

{ 

register int k, state, b\ 

for (k = state =0; k < 8; k++) { 

6 = ((fc < 4 ? o./i > (8 * (3 - k)) : o.l » (8 * (7 - k)))) & #ff ; 
if (6 = 0) { 

if (state) printf ("'/,s ,0" , state > 1 ? state = 1; 

} else if (b> A b < ’~ ’) 

printf ("7>s7,c" , state > 1 ? "" : state = 1 ? ,b), state = 2; 

else printf ("7>s#7,x." , state > 1 ? : state = 1 ? ,b), state = 1; 

} 

if (state =0) printf ("0")-, 
else if (state > 1) printf ("\"")\ 

} 

161 . (Cases that set and clear tracing and breakpoints I 61 ) = 

case inst.ptr = scan Jiex(p -\- 1, cur ^seg)-, p = next. char 

halted = false; break; 
case ’t’: case ’u’: k = *p; 

val = scanjiex(p + 1, cur.seg); p = next.char; 
if (val.h < *20000000) { 

II = mem.find(val); 

if (k = ’t’) ll->bkpt 1= trace.bit; 

else ll^bkpt &= ^trace.bit; 

} 

break; 

case ’b’: for (k — 0,p++; -^isxdigit(*p); p++) 
if (*p = ’r’) k 1= read.bit; 
else if (*p = ’w’) k \= write.bit; 
else if (*p = ’x’) k \= exec.bit; 
val = scanjiex(p, cur.seg); p = next.char; 
if (-'(val .h &i sign.bit)) { 
ll = mem.find(val); 
ll^bkpt = (ll->bkpt & —8) I k; 

} 

break; 

case ’T’: cur.seg .h = Q; goto passzt; 
case ’D’: cur. seg .h = *20000000; goto passit; 
case ’P’: cur. seg .h = *40000000; goto passit; 
case ’S’: cur. seg .h — *60000000; goto passit; 
case ’B’: show.breaks (mem.root); 
passit: p++; break; 

This code is used in section 149. 
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162. (Subroutines 12 ) += 

void show.breaks ARGS((mem_node *)); 
void show.breaks {p) 

mem.node *p; 

{ 

register int j; 
octa cur Joe, 

if (p^left) showJ>reaks{p^left)\ 
for (j = 0; j < 512; j++) 
if {p-dat[j].hkpt) { 

cur Joe = incr{p^loc, 4: * j)-, 

print/ ("uu°/08xy,08xu7oc’/.c"/,c’/.c\n" , curjoc.h, curjoc.l, 

p-‘dat\j].bkpt traceJ)it 1 ’t’ : ’ ,p->dat[j].bkpt readj)it 1 ’r’ : 
p^dat\j].bkpt &c write^bit ? ’w’ : ,p^dat[j].bkpt Sj: exec-bit ? ’x’ : 

} 

if {p-^right ) show-breaks {p-‘right ) ; 

} 

163. We put pointers to the command line strings in octabytes M8[Pool_Segment + 
8 + (fc + 1)] for 0 < A: < argc] the strings themselves are octabyte-aligned, starting at 
M8[Pool_Segment + 8 * {arge + 2)]. The location of the first free octabyte in the pool 
segment is placed in M8[Pool_Segment]. 

( Load the command line arguments 163 ) = 
x.h = *40000000, = *8; 

loc = incr{x, 8 * {arge + 1)); 
for (fc = 0; k < arge; k++, cur.arg ++) { 

II = mem.find{x); 

llMet = loc.h, {II + = loc.l; 

ll = mem.find{loc); 

mmputchars {{unsigned char *) *cur.arg , strlen{*cur.arg), loc); 
x.l += 8, loc.l += 8 + {strlen{*cur.arg) & —8); 

} 

X.l = 0; 11 = mem.find(x); = loc.h, {ll + = loc.l', 

This code is used in section 141. 



arge: int, §141. 

ARCS = macro (), §11. 

bkpt: unsigned char, §16. 

cur^arg: char **, §144. 

cur^seg: octa, §151. 

dat: mem.tetra [], §16. 

exec.bit = macro, §58. 

false = 0, §9. 

h: tetra, §10. 

halted: bool, §61. 

incr: octa (), mmix-ARITH §6. 

instjptr: octa, §61. 

isxdigit: int (), <ctype.h>. 

k: register int, §62. 



1 : tetra, §10. 

left: mem.node *, §16. 

ll: register mem.tetra *, 

§62. 

loc: octa, §16. 
loc: octa, §61. 
mem.find: mem.tetra *(), 
§ 20 . 

mem.node = struct, §16. 
mem.root: mem.node *, §19. 
mmputchars : void (), §117. 
next^char: char *, 
MMIX-ARITH §69. 



octa = struct, §10. 
p: register char *, §62. 
printf: int (), <stdio.h>. 
read^bit = macro, §58. 
right: mem.node *, §16. 
scan.hex: octa (), §154. 
sign.bit =T[iQ.cvo, §15. 
strlen: size.t (), <string.h>. 
tet: tetra, §16. 
trace.bit = macro, §58. 
val: octa, MMIX-ARITH §69. 
write.bit = macro, §58. 
x: octa, §61. 
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164 . (Get ready to UNSAVE the initial context 164) = 
x.h — 0, x.l — *i0; 

ll — mem.find{x)‘, 
if instjptr = x\ 

resuming = true\ 
rop = RESUME. AGAIN; 
g[rX].l = ((tetra) UNSAVE < 24) + 255; 
if (dump.file) { 
x.l = 1; 

dump ( memjroot ) ; 
dumpHet (0) , dumpHet (0) ; 
exit{0)\ 

} 

This code is used in section 141. 

165 . The special option ‘-D<f ilenEmie>’ can be used to prepare binary files needed 
by the MMIX-in-MMIX simulator of Section 1.4.3 (See The Art of Computer Program- 
ming, Volume 1, Fascicle 1.) This option puts big-endian octabytes into a given file; 
a location I is followed by one or more nonzero octabytes Ms[Z], Mg[^ -|- 8], Ms[Z -I- 16], 
. . . , followed by zero. The simulated simulator knows how to load programs in such 
a format (see exercise 1.4.3 '-20), and so does the meta-simulator MMMIX. 

( Subroutines 12 ) -|-= 

void dump ARGS((mem_node *)); 
void dump.tet ARGS((tetra)); 
void dumpijp) 

mem.node *p\ 

{ 

register int j; 
octa curJoc\ 
if (p-'left) dump{pr>left)\ 
for (j = 0; j < 512; j += 2) 

if {p>^dat[j].tet y p-‘dat[j + l].tet) { 
cur Joe = incr{pr'loc,4:* j); 
if {curjoc.l A x.l V curjoc.h A x.h) { 
if {x.l A 1) dumpjet{0), dumpjet{0); 
dump Jet {cur Joe. h)-, dump Jet {cur Joe. l)\ x = curJoc, 

} 

dump Jet {p-'dat [j] . tet ) ; 
dump Jet {pr‘dat[j -\- Ij.tet); 

X = incr{x, 8); 

} 

if {pr*right ) dump {p->right ) ; 

} 
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166. (Subroutines 12 ) += 
void dumpj,et{t) 

tetra t; 

{ 

fputc{t ^ 24, dump.file); 
fputc{(t ^ 16) & dump.file); 
fputc{{t ^ 8) & ^ff , dump_file)\ 
fputc{t dump.file)] 

} 



ARCS = macro (), §11. 
dat: mem.tetra [], §16. 
dump^file: FILE *, §144. 
exit: void (), <stdlib.h>. 
fputc: int (), <stdio.h>. 
g: octa [], §76. 
h: tetra, §10. 

incr: octa (), mmix-ARITH §6. 
instjptr: octa, §61. 

1: tetra, §10. 



left: mem_node *, §16. 
ll: register mem.tetra *, 

§62. 

loc: octa, §16. 
mem.find: mem.tetra *(), 
§ 20 . 

mem.node = struct, §16. 
mem.root: mem.node *, §19. 
octa = struct, §10. 

RESUME. AGAIN =0, §125. 



resuming: bool, §61. 
right: mem.node *, §16. 
rop: int, §61. 

=25, §55. 
tet: tetra, §16. 
tetra = unsigned int, §10. 
true = 1, §9. 

UNSAVE = ^fb, §54. 
x: octa, §61. 
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167. Names of the sections. 

( Cases for formatting characters 134 , 136, 13S ) Used in section 133. 

( Cases for individual MMIX instructions 84, 85, 86, 87, 88, 89, 90 , 92 , 93 , 94 , 95 , 96, 97 , 101 , 
102 , 104, 106, 107, 108, 124 ) Used in section 60. 

( Cases for lopcodes in the main loop 33 , 34 , 35 , 36 ) Used in section 29. 

( Cases that change cur.disp-mode 152 ) Used in section 149. 

( Cases that define cur^disp-type 153 ) Used in section 149. 

( Cases that set and clear tracing and breakpoints 161 ) Used in section 149. 

( Check for trip interrupt 122 ) Used in section 60. 

( Check if the source file has been modified 44 ) Used in section 42. 

( Convert relative address to absolute address 70 ) Used in section 60. 

(Display and/or set the value of the current octabyte 156) Used in section 149. 

( Display the current octabyte 159 ) Used in section 156. 

( Either halt or print warning 109 ) Used in section 108. 

( Fetch the next instruction 63 ) Used in section 60. 

(Fix up the subtrees oi *q 22 ) Used in section 21. 

( Get ready to UNSAVE the initial context 164 ) Used in section 141. 

( Get ready to update rA 100 ) Used in section 97. 

( Get ready to update rC 99 ) Used in section 97. 

( Global variables 19 , 25 , 31 , 40 , 48, 52 , 56, 61 , 65, 76, 110 , 113 , 121 , 129 , 139 , 144 , 151 ) Used 
in section 141. 

( Increase rL 81 ) Used in section 80. 

( Info for arithmetic commands 66 ) Used in section 65. 

( Info for branch commands 67 ) Used in section 65. 

( Info for load/store commands 68 ) Used in section 65. 

( Info for logical and control commands 69 ) Used in section 65. 

(Initialize everything 14 , is, 24 , 32 , 41 , 77 , 147 ) Used in section 141. 

( Initiate a trip interrupt 123 ) Used in section 122. 

( Install operand fields 71 ) Used in section 60. 

( Install register X as the destination, adjusting the register stack if necessary so ) 
Used in section 60. 

( Install special operands when resuming an interrupted operation 126 ) Used in 
section 71. 

( Interact with the user 149 ) Used in section 141. 

(Interpret character *p in the trace format 133 ) Used in section 131. 

(Load and write four bytes 119 ) Used in section 117. 

( Load and write one byte 118 ) Used in section 117. 

( Load g[k] from the register stack 105 ) Used in section 104. 

( Load tet as a normal item 30 ) Used in section 29. 

( Load the command line arguments 163 ) Used in section 141. 

( Load the next item 29 ) Used in section 32. 

( Load the postamble 37 ) Used in section 32. 

( Load the preamble 28 ) Used in section 32. 

( Local registers 62 , 75 ) Used in section 141. 

( Open a file for dumping binary output 146 ) Used in section 143. 
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( Open a file for simulated standard input 145 ) Used in section 143. 

(Perform one instruction 60) Used in section 141. 

(Prepare memory arguments ma = M[a] and mb = M[6] if needed 111 ) Used in 

section 108. 

( Prepare to list lines from a new source file 49 ) Used in section 47. 

( Prepare to perform a ropcode 125 ) Used in section 124. 

( Preprocessor macros 11, 43 , 46 ) Used in section 141. 

( Print a stream-of-consciousness description of the instruction 131 ) Used in sec- 
tion 128. 

( Print all the frequency counts 53 ) Used in section 141. 

( Print changes to rL 132 ) Used in section 131. 

( Print frequency data for location p^loc -|- 4 * j 51 ) Used in section 50. 

(Print the frequency count, the location, and the instruction 130) Used in sec- 
tion 128. 

( Process the command line 142 ) Used in section 141. 

( Put a new command in command-buf 150 ) Used in section 149. 

(Read and store one byte; return if done 115) Used in section 114. 

( Read and store up to four bytes; return if done 116 ) Used in section 114. 

( Scan a string constant 155 ) Used in section 153. 

( Search for key in the treap, setting last^mem and p to its location 21 ) Used in 

section 20. 

( Set b from register X 74 ) Used in section 71. 

( Set b from special register 79 ) Used in section 71. 

(Set g[k] = val only if permissible i5s) Used in section 157. 

( Set L = Z = min(z, L'j 98 ) Used in section 97. 

( Set the current octabyte to val 157 ) Used in section 156. 

( Set y from register Y 73 ) Used in section 71. 

( Set 2; as an immediate wyde 78 ) Used in section 71. 

(Set 2: from register Z 72) Used in section 71. 

(Store g[k] in the register stack 103) Used in section 102. 

(Subroutines 12 , 13 , 15 , 17 , 20 , 26 , 27 , 42 , 45 , 47 , so, 82 , 83, 91 , 114 , 117 , 120 , 137 , 140, 143 , 148, 
154, 160, 162, 165, 166) Used in section 141. 

( Trace the current instruction, if requested 128 ) Used in section 60. 

(Type declarations 9, 10, I6, 38, 39, 54, 55, 59, 64, 135) Used in section 141. 

(Update the clocks 127) Used in section 60. 



MMIXAL 

1. Definition of MMIXAL. This program takes input written in MMIXAL, the 
MMIX assembly language, and translates it into binary files that can be loaded and 
executed on MMIX simulators. MMIXAL is much simpler than the “industrial strength” 
assembly languages that computer manufacturers usually provide, because it is pri- 
marily intended for the simple demonstration programs in The Art of Computer 
Programming. Yet it tries to have enough features to serve also as the back end of 
compilers for C and other high-level languages. 

Instructions for using the program appear at the end of this document (see page 
487). First we will discuss the input and output languages in detail; then we’ll consider 
the translation process, step by step; then we’ll put everything together. 

2. A program in MMIXAL consists of a series of lines, each of which usually contains 
a single instruction. However, lines with no instructions are possible, and so are lines 
with two or more instructions. 

Each instruction has three parts called its label field, opcode field, and operand 
field; these fields are separated from each other by one or more spaces. The label 
field, which is often empty, consists of all characters up to the first blank space. The 
opcode field, which is never empty, runs from the first nonblank after the label to 
the next blank space. The operand field, which again might be empty, runs from the 
next nonblank character (if any) to the first blank or semicolon that isn’t part of a 
string or character constant. If the operand field is followed by a semicolon, possibly 
with intervening blanks, a new instruction begins immediately after the semicolon; 
otherwise the rest of the line is ignored. The end of a line is treated as a blank space 
for the purposes of these rules, with the additional proviso that string or character 
constants are not allowed to extend from one line to another. 

The label field must begin with a letter or a digit; otherwise the entire line is 
treated as a comment. Popular ways to introduce comments, either at the beginning 
of a line or after the operand field, are to precede them by the character 7. as in TJ^X, 
or by // as in C-H-; MMIXAL is not very particular. However, Lisp-style comments 
introduced by single semicolons will fail if they follow an instruction, because they 
will be assumed to introduce another instruction. 

3. MMIXAL has no built-in macro capability, nor does it know how to include header 
files and such things. But users can run their files through a standard C preprocessor 
to obtain MMIXAL programs in which macros and such things have been expanded. 
(Caution: The preprocessor also removes C-style comments, unless it is told not to 
do so.) Literate programming tools could also be used for preprocessing. 

If a line begins with the special form ‘# (integer) (string)’, this program interprets 
it as a line directive emitted by a preprocessor. For example, 

# 13 "f 00 .mms" 

means that the following line was line 13 in the user’s source file foo.mms. Line 
directives allow us to correlate errors with the user’s original file; we also pass them 
to the output, for use by simulators and debuggers. 
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4. MMIXAL deals primarily with symbols and constants, which it interprets and 
combines to form machine language instructions and data. Constants are simplest, 
so we will discuss them first. 

A decimal constant is a sequence of digits, representing a number in radix 10. A hex- 
adecimal constant is a sequence of hexadecimal digits, preceded by #, representing a 
number in radix 16: 

(digit) —^0 |1|2|3|4|5|6|7|8|9 
( hex digit ) — ( digit )|A|B|C|D|E|F|a|b|c|d|e|f 
(decimal constant) — (digit) | (decimal constant )( digit ) 

( hex constant ) — # ( hex digit ) | ( hex constant ) ( hex digit ) 

Constants whose value is 2®^ or more are reduced modulo 2®'*. 

5. A character constant is a single character enclosed in single quote marks; it 
denotes the ASCII or Unicode number corresponding to that character. For example, 
'a' represents the constant #61, also known as 97. The quoted character can be 
anything except the character that the C library calls \n or newline, that character 
should be represented as #a. 

( character constant ) — > ’ ( single byte character except newline ) ’ 

( constant ) — >■ ( decimal constant ) | ( hex constant ) | ( character constant ) 

Notice that ’ ’ ’ represents a single quote, the code #27; and represents a back- 
slash, the code #5c. MMIXAL characters are never “quoted” by backslashes as in the 
C language. 

In the present implementation a character constant will always be at most 255, 
since wyde character input is not supported. But if the input were in Unicode one 
could write, say, ’N’ or ’’X.’ for #05d0 or #0416. The present program does not 
support Unicode directly because basic software for inputting and outputting 16-bit 
characters was still in a primitive state at the time of writing. But the data structures 
below are designed so that a change to Unicode will not be difficult when the time is 
ripe. 

6. A string constant like "Hello" is an abbreviation for a sequence of one or more 

character constants separated by commas: Any character 

except newline or the double quote mark " can appear between the double quotes 
of a string constant. Similarly, "it'SiSgrt" is an abbrevation for ’A’ , ’iS’ , ’tifi’ (namely 
#9ad8,#5fb7,#7eb3) when Unicode is supported. 
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7. A symbol in MMIXAL is any sequence of letters and digits, beginning with a letter. 
A colon ‘ ’ or underscore symbol is regarded as a letter, for purposes of this 
definition. All extended-ASCII characters like ‘e’, whose 8-bit code exceeds 126, are 
also treated as letters. 

( letter ) — !► A|B|---|Z|a|b|---|z|; |_|( character with code value > 126 ) 
(symbol) — !• (letter) | ( symbol )( letter ) | ( symbol )( digit ) 

In future implementations, when MMIXAL is used with Unicode, all wyde characters 
whose 16-bit code exceeds 126 will be regarded as letters; thus MMIXAL symbols will 
be able to involve Greek letters or Chinese characters or thousands of other glyphs. 

8. A symbol is said to be fully qualified if it begins with a colon. Every symbol that is 
not fully qualified is an abbreviation for the fully qualified symbol obtained by placing 
the current prefix in front of it; the current prefix is always fully qualified. At the 
beginning of an MMIXAL program the current prefix is simply the single character ‘ : ’, 
but the user can change it with the PREFIX command. For example. 



ADD 


x,y,z 


7o means ADD :x,:y,:z 


PREFIX 


Foo : 


7o current prefix is :Foo; 


ADD 


x,y,z 


7o means ADD :Foo:x, :Foo:y, :Foo:z 


PREFIX 


Bar : 


7o current prefix is : Foo: Bar: 


ADD 


:x,y, :z 


7„ means ADD :x, :Foo:Bar:y, :z 


PREFIX 




7o current prefix reverts to : 


ADD 


x,Foo:Bar;y,Foo:z 


7„ means ADD :x, :Foo:Bar:y, :Foo:z 



This mechanism allows large programs to avoid conflicts between symbol names, when 
parts of the program are independent and/or written by different users. The current 
prefix conventionally ends with a colon, but this convention need not be obeyed. 

9. A local symbol is a decimal digit followed by one of the letters B, F, or H, meaning 
“backward,” “forward,” or “here”: 

( local operand ) — >■ ( digit ) B | ( digit ) F 
( local label ) — ( digit ) H 

The B and F forms are permitted only in the operand field of MMIXAL instructions; 
the H form is permitted only in the label field. A local operand such as 2B stands for 
the last local label 2H in instructions before the current one, or 0 if 2H has not yet 
appeared as a label. A local operand such as 2F stands for the first 2H in instructions 
after the current one. Thus, in a sequence such as 

2H JMP 2F 
2H JMP 2B 

the first instruction jumps to the second and the second jumps to the first. 

Local symbols are useful for references to nearby points of a program, in cases where 
no meaningful name is appropriate. They can also be useful in special situations where 
a redefinable symbol is needed; for example, an instruction like 

9H IS 9B+1 

will maintain a running counter. 
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10. Each symbol receives a value called its equivalent when it appears in the label 
field of an instruction; it is said to be defined after its equivalent has been established. 
A few symbols, like rA and RDUND_DFF and Fopen, are predefined because they refer 
to fixed constants associated with the MMIX hardware or its rudimentary operating 
system; otherwise every symbol should be defined exactly once. The two appearances 
of ‘2H’ in the example above do not violate this rule, because the second ‘2H’ is not 
the same symbol as the first. 

A predefined symbol can be redefined (given a new equivalent). After it has been 
redefined it acts like an ordinary symbol and cannot be redefined again. A complete 
list of the predefined symbols appears in the program listing below. 

Equivalents are either pure or register numbers. A pure equivalent is an unsigned 
octabyte, but a register number equivalent is a one-byte value, between 0 and 255. 
A dollar sign is used to change a pure number into a register number; for example, 
‘$20’ means register number 20. 
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11 . Constants and symbols are combined into expressions in a simple way: 

( primary expression ) — ( constant ) | ( symbol ) | ( local operand ) | @ | 

((expression)) | (unary operator) (primary expression) 

(term) — > (primary expression) | ( term )( strong operator )( primary expression) 

( expression ) — > ( term ) | ( expression ) ( weak operator ) ( term ) 

( unary operator ) — + | - | ~ | $ | & 

( strong operator ) — > * | / | // | "/. | « | » | & 

( weak operator ) — ^ + | ~ | I | ~ 

Each expression has a value that is either pure or a register number. The character 0 
stands for the current location, which is always pure. The unary operators +, $, 

and & mean, respectively, “do nothing,” “subtract from zero,” “complement the bits,” 
“change from pure value to register number,” and “take the serial number.” Only 
the first of these, +, can be applied to a register number. The last unary operator, 
&, applies only to symbols, and it is of interest primarily to system programmers; 
it converts a symbol to the unique positive integer that is used to identify it in the 
binary file output by MMIXAL. 

Binary operators come in two flavors, strong and weak. The strong ones are 
essentially concerned with multiplication or division: x*y, x/y, x//y, x'/.y, x<<y, 
x>>y, and x&y stand respectively for (xxy) mod 2^^ (multiplication), \_x/y\ (division), 
(fractional division), x mod y (remainder), (x x 2^) mod 2®"'’ (left shift), 
[x/2*'J (right shift), and x&y (bitwise and) on unsigned octabytes. Division is legal 
only if y > 0; fractional division is legal only if x < y. None of the strong binary 
operations can be applied to register numbers. 

The weak binary operations x+y, x-y, x|y, and x~y stand respectively for (x + 
y) mod 2®^ (addition), (x — y) mod 2^^ (subtraction), x | y (bitwise or), and x © y 
(bitwise exclusive-or) on unsigned octabytes. These operations can be applied to 
register numbers only in four contexts: (register) + (pure), (pure) + (register), 
(register) — (pure) and (register) — (register). For example, if x denotes $1 and y 
denotes $10, then x+3 and 3+x denote $4, and y-x denotes the pure value 9. 

Register numbers within expressions are allowed to be arbitrary octabytes, but a 
register number assigned as the equivalent of a symbol should not exceed 255. 

(Incidentally, one might ask why the designer of MMIXAL did not simply adopt the 
existing rules of C for expressions. The primary reason is that the designers of C 
chose to give <<, >>, and & a lower precedence than +; but in MMIXAL we want to be 
able to write things like o<<24+x<<16+y<<8+z or 0+yz<<2 or 0+ (#100-0) &#ff. Since 
the conventions of C were inappropriate, it was better to make a clean break, not 
pretending to have a close relationship with that language. The new rules are quite 
easily memorized, because MMIXAL has just two levels of precedence, and the strong 
binary operations are all essentially multiplicative by nature while the weak binary 
operations are essentially additive.) 

12 . A symbol is called a future reference until it has been defined. MMIXAL restricts 
the use of future references, so that programs can be assembled quickly in one pass 
over the input; therefore all expressions can be evaluated when the MMIXAL processor 
first sees them. 
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The restrictions are easily stated: Future references cannot be used in expressions 
together with unary or binary operators (except the unary +, which does nothing); 
moreover, future references can appear as operands only in instructions that have 
relative addresses (namely branches, probable branches, JMP, PUSHJ, GETA) or in 
octabyte constants (the pseudo-operation OCTA). Thus, for example, one can say 
JMP IF or JMP lB-4, but not JMP lF-4. 

13. We noted earlier that each MMIXAL instruction contains a label field, an opcode 
field, and an operand field. The label field is either empty or a symbol or local label; 
when it is nonempty, the symbol or local label receives an equivalent. The operand 
field is either empty or a sequence of expressions separated by commas; when it is 
empty, it is equivalent to the simple operand field ‘O’. 

( instruction ) — ( label ) ( opcode ) ( operand list ) 

( label ) — )► ( empty ) | ( symbol ) | ( local label ) 

( operand list ) — ( empty ) | ( expression list ) 

( expression list ) — >■ ( expression ) | ( expression list ) , ( expression ) 

The opcode field contains either a symbolic MMIX operation name (like ADD), or 
an alias operation, or a pseudo- operation. Alias operations are alternate names 
for MMIX operations whose standard names are inappropriate in certain contexts. 
Pseudo-operations do not correspond directly to MMIX commands, but they govern 
the assembly process in important ways. 

There are two alias operations: 

• SET $X,$Y is equivalent to OR $X,$Y,0; it sets register X to register Y. Similarly, 
SET $X,Y (when Y is not a register) is equivalent to SETL $X,Y. 

• LDA $X,$Y,$Z is equivalent to ADDU $X,$Y,$Z; it loads the address of memory 
location $Y -F$Z into register X. Similarly, LDA $X , $Y , Z is equivalent to ADDU $X , $Y , Z. 

The symbolic operation names for genuine MMIX operations should not include the 
suffix I for an immediate operation or the suffix B for a backward jump; MMIXAL 
determines such things automatically. Thus, one never writes ADDI or JMPB in the 
source input to MMIXAL, although such opcodes might appear when a simulator or 
debugger or disassembler is presenting a numeric instruction in symbolic form. 

(opcode) — >■ (symbolic MMIX operation) | (alias operation) 

I ( pseudo-operation ) 

(symbolic MMIX operation) — TRAP | FCMP | • • • | TRIP 
( alias operation ) — !• SET | LDA 

(pseudo-operation) — ^ IS | LOG | PREFIX | GREG | LOCAL | BSPEC | ESPEC 

I BYTE I WYDE | TETRA | OCTA 
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14. MMIX operations like ADD require exactly three expressions as operands. The 
first two must be register numbers. The third must be either a register number or a 
pure number between 0 and 255; in the latter case, ADD becomes ADDI in the assembled 
output. Thus, for example, the command “set register 1 to the sum of register 2 and 
register 3” could be expressed as 



ADD $1,$2,$3 



or as, say, 



ADD x,y,y+l 



if the equivalent of x is $1 and the equivalent of y is $2. The command “subtract 5 
from register 1” could be expressed as 



SUB 



or as 

SUB x,x,5 

but not as ‘SUBI or ‘SUBI x,x,5’. 

MMIX operations like PLOT require either three operands (register, pure, regis- 
ter/pure) or only two (register, register/pure). In the first case the middle operand is 
the rounding mode, which is best expressed in terms of the predefined symbolic values 
ROUND_CURRENT, R0UND_0FF, RDUND_UP, RDUND_DDWN, ROUND_NEAR, for (0,1,2, 3,4) 
respectively. In the second case the middle operand is understood to be zero (namely, 
RDUND_CURRENT). 

MMIX operations like SETL or INCH, which involve a wyde intermediate constant, 
require exactly two operands, (register, pure). The value of the second operand should 
fit in two bytes. 

MMIX operations like BNZ, which mention a register and a relative address, also 
require two operands. The first operand should be a register number. The second 
operand should yield a result r in the range —2^® < r < 2^® when the current location 
is subtracted from it and the result is divided by 4. The second operand might also 
be undefined; in that case, the eventual value must satisfy the restriction stated for 
defined values. The opcodes GETA and PUSH! are similar, except that the first operand 
to PUSH! might also be pure (see below). The JMP operation is also similar, but it 
has only one operand, and it allows the larger address range —2^^ < r < 2^^. 

MMIX operations that refer to memory, like EDO and STHT and GO, are treated like 
ADD if they have three operands, except that the first operand should be pure (not 
a register number) in the case of PRELD, PREGO, PREST, STCO, SYNCD, and SYNCID. 
These opcodes also accept a special two-operand form, in which the second operand 
stands for a base address and an immediate offset (see below). 

The first operand of PUSH! and PUSHGO can be either a pure number or a register 
number. In the first case (‘PUSHJ 2, Sub’ or ‘PUSHGO 2, Sub’) the programmer might 
be thinking “let’s push down two registers”; in the second case (‘PUSH! $2, Sub’ or 
‘PUSHGO $2, Sub’) the programmer might be thinking “let’s make register 2 the hole 
position for this subroutine call.” Both cases result in the same assembled output. 
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The remaining MMIX opcodes are idiosyncratic: 

NEC r , p , z ; 

PUT s,z; 

GET r , s ; 

POP p,yz; 

RESUME xyz; 

SAVE r,0; 

UNSAVE r; 

SYNC xyz; 

TRAP X , y , z or TRAP x , yz or TRAP xyz ; 

SWYM and TRIP are like TRAP. Here s is an integer between 0 and 31, preferably given 
by one of the predefined symbols rA, rB, . . . for special register codes; r is a register 
number; p is a pure byte; x, y, and z are either register numbers or pure bytes; yz 
and xyz are pure values that fit respectively in two and three bytes. 

All of these rules can be summarized by saying that MMIXAL treats each MMIX opcode 
in the most natural way. When there are three operands, they affect fields X, Y, and Z 
of the assembled MMIX instruction; when there are two operands, they affect helds X 
and YZ; when there is just one operand, it affects field XYZ. 

15. In all cases when the opcode corresponds to an MMIX operation, the MMIXAL 
instruction tells the assembler to carry out four steps: (1) Align the current location 
so that it is a multiple of 4, by adding 1, 2, or 3 if necessary; (2) Dehne the equivalent 
of the label held to be the current location, if the label is nonempty; (3) Evaluate 
the operands and assemble the specihed MMIX instruction into the current location; 
(4) Increase the current location by 4. 
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16 . Now let’s consider the pseudo-operations, starting with the simplest cases. 

• (label) IS (expression) defines the value of the label to be the value of the 
expression, which must not be a future reference. The expression may be either 
pure or a register number. 

• (label) LOG (expression) hrst defines the label to be the value of the current 
location, if the label is nonempty. Then the current location is changed to the value 
of the expression, which must be pure. 

For example, ‘LOG #1000’ will start assembling subsequent instructions or data in 
location whose hexadecimal value is *1000. ‘X LOG @+500’ defines X to be the address 
of the first of 500 bytes in memory; assembly will continue at location X -|- 500. The 
operation of aligning the current location to a multiple of 256, if it is not already 
aligned in that way, can be expressed as ‘LOG @+(256-@)&255’. 

A less trivial example arises if we want to emit instructions and data into two 
separate areas of memory, but we want to intermix them in the MMIXAL source file. 
We could start by defining 8H and 9H to be the starting addresses of the instruction 
and data segments, respectively. Then, a sequence of instructions could be enclosed 
in ‘LOG 8B: ...; 8H IS @’; a sequence of data could be enclosed in ‘LOG 9B; ...; 
9H IS @’. Any number of such sequences could then be combined. Instead of the 
two pseudo-instructions ‘8H IS @; LOG 9B’ one could in fact write simply ‘8H LOG 9B’ 
when switching from instructions to data. 

• PREFIX ( symbol ) redefines the current prefix to be the given symbol (fully quali- 
fied). The label held should be blank. 

17 . The next pseudo-operations assemble bytes, wydes, tetrabytes, or octabytes of 
data. 

• ( label ) BYTE ( expression list ) dehnes the label to be the current location, if the 
label held is nonempty; then it assembles one byte for each expression in the expression 
list, and advances the current location by the number of bytes. The expressions should 
all be pure numbers that ht in one byte. 

String constants are often used in such expression lists. For example, if the current 
location is *1000, the instruction BYTE "Hello", 0 assembles six bytes containing 
the constants ’H’, ’e’, ’o’ , and 0 into locations *1000, ..., *1005, and 

advances the current location to *1006. 

• ( label ) WYDE ( expression list ) is similar, but it hrst makes the current location 
even, by adding 1 to it if necessary. Then it dehnes the label (if a nonempty label is 
present), and assembles each expression as a two-byte value. The current location is 
advanced by twice the number of expressions in the list. The expressions should all 
be pure numbers that ht in two bytes. 

• ( label ) TETRA ( expression list ) is similar, but it aligns the current location to a 
multiple of 4 before dehning the label; then it assembles each expression as a four-byte 
value. The current location is advanced by 4n if there are n expressions in the list. 
Each expression should be a pure number that hts in four bytes. 

• ( label ) OGTA ( expression list ) is similar, but it hrst aligns the current location to 
a multiple of 8; it assembles each expression as an eight-byte value. The current 
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location is advanced by 8n if there are n expressions in the list. Any or all of the 
expressions may be future references, but they should all be defined as pure numbers 
eventually. 
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18. Global registers are important for accessing memory in MMIX programs. They 
could be allocated by hand, and defined with IS instructions, but MMIXAL provides a 
mechanism that is usually much more convenient: 

• (label) GREG (expression) allocates a new global register, and assigns its number 
as the equivalent of the label. At the beginning of assembly, the current global 
threshold G is $255. Each distinct GREG instruction decreases G by 1; the final value 
of G will be the initial value of rG when the assembled program is loaded. 

The value of the expression will be loaded into the global register at the beginning 
of the program. If this value is nonzero, it should remain constant throughout the 
program execution] such global registers are considered to be base addresses. Two or 
more base addresses with the same constant value are assigned to the same global 
register number. 

Base addresses can simplify memory accesses in an important way. Suppose, for 
example, five octabyte values appear in a data segment, and their addresses are called 
AA, BB, CC, DD, and EE: 

AA LOG 0+8 ;BB LOG @+8;GG LOG 0+8 ;DD LOG 0+8; EE LOG 0+8 

Then if you say Base GREG AA, you will be able to write simply ‘LDO $1 , AA’ to bring 
AA into register $1, and ‘LDO $2,GG’ to bring CC into register $2. 

Here’s how it works: Whenever a memory operation such as LDO or STB or GO has 
only two operands, the second operand should be a pure number whose value can be 
expressed as 6 + i5, where 0 < i5 < 256 and b is the value of a base address in one of the 
preceding GREG commands. The MMIXAL processor will find the closest base address 
and manufacture an appropriate command. For example, the instruction ‘LDO $2,GG’ 
in the example of the preceding paragraph would be converted automatically to 
‘LDO $2, Base, 16’. 

If no base address is close enough, an error message will be generated, unless this 
program is run with the -x option on the command line. The -x option inserts 
additional instructions if necessary, using global register 255, so that any address is 
accessible. For example, if there is no base address that allows LDO $2,FF to be 
implemented in a single instruction, but if FF equals Base+1000, then the -x option 
would assemble two instructions, 

SETL $255,1000; LDO $2, Base, $255 

in place of LDO $2,FF. Gaution: The -x feature makes the number of actual MMIX 
instructions hard to predict, so extreme care must be used if your style of coding 
includes relative branch instructions in dangerous forms like ‘BNZ x,0+8’. 

This base address convention can be used also with the alias operation LDA. For 
example, ‘LDA $3,GG’ loads the address of GG into register 3, by assembling the 
instruction ‘ADDU $3, Base, 16’. 
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MMIXAL also allows a two-operand form for memory operations such as 

LDD $1,$2 

to be an abbreviation for ‘LDO 

When MMIXAL programs use subroutines with a memory stack in addition to the 
built-in register stack, they usually begin with the instructions ‘sp GREG 0 ; fp GREG O’; 
these instructions allocate a stack pointer sp=$254 and a frame pointer fp=$253. 
However, subroutine libraries are free to implement any conventions for global regis- 
ters and stacks that they like. 

19 . Short programs rarely run out of global registers, but long programs need a 
mechanism to check that GREG hasn’t been used too often. The following pseudo- 
instruction provides the necessary safety valve: 

• LOCAL (expression) ensures that the expression will be a local register in the 
program being assembled. The expression should be a register number, and the 
label field should be blank. At the close of assembly, MMIXAL will report an error if 
the final value of G does not exceed all register numbers that are declared local in 
this way. 

A LOCAL instruction need not be given unless the register number is 32 or more. 
(MMIX always considers $0 through $31 to be local, so MMIXAL implicitly acts as if the 
instruction ‘LOCAL $31’ were present.) 

20 . Finally, there are two pseudo-instructions to pass information and hints to the 
loading routine and/or to debuggers that will be using the assembled program. 

• BSPEC ( expression ) begins “special mode” ; the ( expression ) should have a value 
that fits in two bytes, and the label field should be blank. 

• ESPEC ends “special mode” ; the operand held is ignored, and the label held should 
be blank. 

All material assembled between BSPEC and ESPEC is passed directly to the output, 
but not loaded as part of the assembled program. Ordinary MMIX instructions cannot 
appear in special mode; only the pseudo-operations IS, PREFIX, BYTE, WYDE, TETRA, 
OCTA, GREG, and LOCAL are allowed. The operand of BSPEC should have a value 
that hts in two bytes; this value identihes the kind of data that follows. (For 
example, BSPEC 0 might introduce information about subroutine calling conventions 
at the current location, and BSPEC 1 might introduce line numbers from a high-level- 
language program that was compiled into the code at the current place. System 
routines often need to pass such information through an assembler to the operating 
system, hence MMIXAL provides a general-purpose conduit.) 
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21. A program should begin at the special symbolic location Main (more precisely, 
at the address corresponding to the fully qualified symbol : Main). This symbol always 
has serial number 1, and it must always be defined. 

Locations should not receive assembled data more than once. (More precisely, the 
loader will load the bitwise xor of all the data assembled for each byte position; but the 
general rule “do not load two things into the same byte” is safest.) All locations that 
do not receive assembled data are initially zero, except that the loading routine will 
put register stack data into segment 3, and the operating system may put command 
line data and debugger data into segment 2. (The rudimentary MMIX operating system 
starts a program with the number of command line arguments in $0, and a pointer 
to the beginning of an array of argument pointers in $1.) Segments 2 and 3 should 
not get assembled data, unless the user is a true hacker who is willing to take the risk 
that such data might crash the system. 
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22. Binary MMO output. When the MMIXAL processor assembles a file called 
foo.mms, it produces a binary output file called foo.mmo. (The suffix mms stands 
for “MMIX symbolic,” and mmo stands for “MMIX object.”) Such mmo files have a 
simple structure consisting of a sequence of tetrabytes. Some of the tetrabytes are 
instructions to a loading routine; others are data to be loaded. 

Loader instructions are distinguished from tetrabytes of data by their first (most 
significant) byte, which has the special escape-code value *98, called mm in the 
program below. This code value corresponds to MMIX’s opcode LDVTS, which is 
unlikely to occur in tetras of data. The second byte X of a loader instruction is 
the loader opcode, called the lopcode. The third and fourth bytes, Y and Z, are 
operands. Sometimes they are combined into a single 16-bit operand called YZ. 
95^:define mm *98 
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23. A small, contrived example will help explain the basic ideas of mmo format. 
Consider the following input file, called test.mms: 

7o A peculiar example of MMIXAL 





LOG 


Data_Segment 


7. location #2000000000000000 




OCTA 


IF 


7, a future reference 


a 


GREG 




7o $254 is base address for ABCD 


ABCD 


BYTE 


"ab" 


7, two bytes of data 




LOG 


#123456789 


7, switch to the instruction segment 


Main 


JMP 


IF 


7o another future reference 




LOG 


@+#4000 


7« skip past 16384 bytes 


2H 


LDB 


$3,ABCD+1 


7o use the base address 




BZ 


$3, IF; TRAP 


7« and refer to the future again 


# 3 ' 


"foo .mms" 


7, this comment is a line directive 




LOG 


2B-4+10 


7, move 10 tetras before previous location 


IH 


JMP 


2B 


7, resolve previous references to IF 




BSPEC 


5 


7« begin special data of type 5 




TETRA 


&a<<8 


7o four bytes of special data 




WYDE 


a~$0 


7, two more bytes of special data 




ESPEC 




7« end a special data packet 




LOG 


ABCD+2 


7, resume the data segment 




BYTE 


"cd",#98 


7, assemble three more bytes of data 



It defines a silly program that essentially puts ’b’ into register 3; the program halts 
when it gets to an all-zero TRAP instruction following the BZ. But the assembled output 
of this file illustrates most of the features of MMIX objects, and in fact test.mms was 
the first test file tried by the author when the MMIXAL processor was originally written. 

The binary output file test .mmo assembled from test .mms consists of the following 
tetrabytes, shown in hexadecimal notation with brief comments. Fuller explanations 
appear with the descriptions of individual lopcodes below. 



98090101 

36f4a363 

98012001 

00000000 

00000000 

00000000 

61620000 

98010002 

00000001 

2345678c 

98060002 

74657374 

2e6d6d73 

98070007 

fOOOOOOO 

98024000 



lop-pre 1, 1 (preamble, version 1, 1 tetra) 

(the file creation time) 

lopJoc *20 , 1 (data segment, 1 tetra) 

(low tetrabyte of address in data segment) 

(high tetrabyte of OCTA IF) 

(low tetrabyte, will be fixed up later) 

("ab", padded with trailing zeros) 
lopJoc 0, 2 (instruction segment, 2 tetras) 

(high tetrabyte of address in instruction segment) 
(low tetrabyte of address, after alignment) 
lop-file 0, 2 (file name 0, 2 tetras) 

("test" ) 

(" .mms") 

lopJine 7 (line 7 of the current file) 

(JMP IF, will be fixed up later) 
lop-skip *4000 (advance 16384 bytes) 
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98070009 
8103fe01 
42030000 
9807000a 
00000000 
98010002 
00000001 
2345a768 
98050010 
OlOOfffS 
98040ff7 
98032001 
00000000 
98060102 
666f6f2e 
6d6d7300 
98070004 
fOOOOOOa 
98080005 
00000200 
OOfeOOOO 
98012001 
0000000a 
00006364 
98000001 
98000000 
980a00f e 
20000000 
00000008 
00000001 
2345678c 
98050000 
203a5040 
50404020 
41204220 
43094408 
83404020 
4d206120 
69056e01 
2345678c 
81400f61 
fe820000 
980c000a 



lopjine 9 (line 9 of the current file) 

(LDB $3,a,l, uses base address a) 

(BZ $3, IF, will be fixed later) 
lopjine 10 (stay on line 10) 

(TRAP) 

lop Joe 0, 2 (instruction segment, 2 tetras) 

(high tetrabyte of address in instruction segment) 

(low tetrabyte of address IH) 

lop-fixrx 16 (fix 16-bit relative address) 

(fixup for location @-4*-ll) 
lop-fixr *ff7 (fix @-4*#ff7) 
lop.fixo *20, 1 (data segment, 1 tetra) 

(low tetrabyte of data segment address to fix) 
lop-file 1,2 (file name 1, 2 tetras) 

("foo. ") 

("mms" , 0) 

lopjine 4 (line 4 of the current file) 

(JMP 2B) 

lop.spec 5 (begin special data of type 5) 

(TETRA &a«8) 

(WYDE a-$0) 

lop Joe *20, 1 (data segment, 1 tetra) 

(low tetrabyte of address in data segment) 

("cd" with leading zeros, because of alignment) 
lop-quote (don’t treat next tetrabyte as a lopcode) 
(byte #98, padded with trailing zeros) 
lop.post $254 (begin postamble, G is 254) 

(high tetrabyte of the initial contents of $254) 

(low tetrabyte of base address $254) 

(high tetrabyte of the initial contents of $255) 

(low tetrabyte of $255, is address of Main) 
lop-stab (begin symbol table) 

(compressed form for symbol table as a ternary trie) 



(ABCD = *2000000000000008, serial 3) 



(Main = *000000012345678c, serial 1) 

(a = $254, serial 2) 

lop-end (end symbol table, 10 tetras) 



lop^end =*c, §24. 
lop^file =*6, §24. 
lop^fixo = *3, §24. 
lop^fixr = *4, §24. 
lop^fixrx =*5, §24, 



lopjine = *7, §24. 
lopjoc = *1, §24. 
lop.post =*a, §24. 
lop.pre = *9, §24. 



lop.quote = *0, §24, 
lop. skip =*2, §24. 
lop.spec = *8, §24. 
lop.stab =*b, §24. 
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24. When a tetrabyte of the mmo file does not begin with the escape code, it is loaded 
into the current location A, and A is increased to the next higher multiple of 4. (If 
A is not a multiple of 4, the tetrabyte actually goes into location A A (—4) = 4[A/4J , 
according to MMIX’s usual conventions.) The current line number is also increased 
by 1, if it is nonzero. 

When a tetrabyte does begin with the escape code, its next byte is the lopcode 
defining a loader instruction. There are thirteen lopcodes: 

• lop_quote: X = ’^00, YZ = 1. Treat the next tetra as an ordinary tetrabyte, even 
if it begins with the escape code. 

• lopJoc: X = ’^01, Y = high byte, Z = tetra count (Z = 1 or 2). Set the current 
location to the 64-bit address defined by the next Z tetras, plus 2®®Y. Usually Y = 0 
(for the instruction segment) or Y = *20 (for the data segment). If Z = 2, the high 
tetra appears first. 

• lop_skip: X = *02, YZ = delta. Increase the current location by YZ. 

• lop_fixo: X = *03, Y = high byte, Z = tetra count (Z = 1 or 2). Load the value of 
the current location A into octabyte P, where P is the 64-bit address defined by the 
next Z tetras plus 2®®Y as in lop Joe. (The octabyte at P was previously assembled 
as zero because of a future reference.) 

• lop_fixr: X = *04, YZ = delta. Load YZ into the YZ field of the tetrabyte in 
location P, where P is A — 4YZ, namely the address that precedes the current location 
by YZ tetrabytes. (This tetrabyte was previously loaded with an MMIX instruction 
that takes a relative address: a branch, probable branch, JMP, PUSHJ, or GETA. Its 
YZ field was previously assembled as zero because of a future reference.) 

• lop-fixrx: X = *05, Y = 0, Z = 16 or 24. Proceed as in lop-fixr, but load S into 
tetrabyte P = A — 4<5 instead of loading YZ into P = A — 4YZ. Here <5 is the value of 
the tetrabyte following the lop-fixrx instruction; its leading byte will be either 0 or 1. 
If the leading byte is 1, i5 should be treated as the negative number (i5A*ffffff) — 2^ 
when calculating the address P. (The latter case arises only rarely, but it is needed 
when fixing up a relative “future” reference that ultimately leads to a “backward” 
instruction. The value of 6 that is xored into location P in such cases will change BZ 
to BZB, or JMP to JMPB, etc.; we have Z = 24 when fixing a JMP, Z = 16 otherwise.) 

• lop_file: X = *06, Y = file number, Z = tetra count. Set the current file number 
to Y and the current line number to zero. If this file number has occurred previously, 
Z should be zero; otherwise Z should be positive, and the next Z tetrabytes are the 
characters of the file name in big-endian order. Trailing zeros follow the file name if 
its length is not a multiple of 4. 

• lopJine: X = *07, YZ = line number. Set the current line number to YZ. If 
the line number is nonzero, the current file and current line should correspond to 
the source location that generated the next data to be loaded, for use in diagnostic 
messages. (The MMIXAL processor gives precise line numbers to the sources of tetra- 
bytes in segment 0, which tend to be instructions, but not to the sources of tetrabytes 
assembled in other segments.) 

• lop_spec: X = *08, YZ = type. Begin special data of type YZ. The subsequent 
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tetrabytes, continuing until the next loader operation other than lop^quote, comprise 
the special data. A lop-quote instruction allows tetrabytes of special data to begin 
with the escape code. 

• lop^pre: X = *09, Y = 1, Z = tetra count. A lop-pre instruction, which defines 
the “preamble,” must be the first tetrabyte of every mmo file. The Y field specifies the 
version number of mmo format, currently 1; other version numbers may be defined later, 
but version 1 should always be supported as described in the present document. The 
Z tetrabytes following a lop-pre command provide additional information that might 
be of interest to system routines. If Z > 0, the first tetra of additional information 
records the time that this mmo file was created, measured in seconds since 00:00:00 
Greenwich Mean Time on 1 Jan 1970. 

• lop.post: X = *0a, Y = 0, Z = G (must be 32 or more). This instruction begins the 
postamble, which follows all instructions and data to be loaded. It causes the loaded 
program to begin with rG equal to the stated value of G, and with $G, G + 1, . . . , $255 
initially set to the values of the next (256 — G) * 2 tetrabytes. These tetrabytes specify 
256 — G octabytes in big-endian fashion (high half first). 

• lop^stab: X = *0b, YZ = 0. This instruction must appear immediately after 
the (256 — G) * 2 tetrabytes following lop.post. It is followed by the symbol table, 
which lists the equivalents of all user-defined symbols in a compact form that will be 
described later. 



• lop.end: X = *0c, YZ = tetra count. This instruction must be the very last 
tetrabyte of each mmo file. Furthermore, exactly YZ tetrabytes must appear between 
it and the lopstab command. (Therefore a program can easily find the symbol table 
without reading forward through the entire mmo file.) 



A separate routine called MMOtype is available to translate binary mmo files into 
human-readable form. 

0 /* the quotation lopcode */ 

/* the location lopcode */ 

/* the skip lopcode */ 

/* the octabyte-lix lopcode */ 

/* the relative-fix lopcode */ 

) /* extended relative-fix lopcode */ 

/* the file name lopcode */ 

/* the file position lopcode */ 

/* the special hook lopcode */ 

/■* the preamble lopcode */ 

/* the postamble lopcode */ 

/* the symbol table lopcode */ 

/* the end-it-all lopcode */ 



^define 


lop.quote *' 


^define 


lop Joe 


*1 


T^tdeflne 


lop. skip 


*2 


95^:deflne 


lop.fixo 


*2, 


95^:deflne 


lop.fixr 


#4 


?5^:deflne 


lop.fixrx 


#5 


^define 


lop.file 


*6 


:^deflne 


lopJine 


*1 


T^tdeflne 


lop. spec 


*% 


95^:deflne 


lop.pre 


*9 


T^tdeflne 


lop.post 


*3, 


95^:deflne 


lop.stab 




T^tdeflne 


lop.end 


*C 
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25. Many readers will have noticed that MMIXAL has no facilities for relocatable 
output, nor does mmo format support such features. The author’s first drafts of MMIXAL 
and mmo did allow relocatable objects, with external linkages, but the rules were 
substantially more complicated and therefore inconsistent with the goals of The Art 
of Computer Programming. The present design might actually prove to be superior to 
the current practice, now that computer memory is significantly cheaper than it used 
to be, because one-pass assembly and loading are extremely fast when relocatability 
and external linkages are disallowed. Different program modules can be assembled 
together about as fast as they could be linked together under a relocatable scheme, 
and they can communicate with each other in much more flexible ways. Debugging 
tools are enhanced when open-source libraries are combined with user programs, and 
such libraries will certainly improve in quality when their source form is accessible to 
a larger community of users. 
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26. Basic data types. This program for the 64 -bit MMIX architecture is based 
on 32 -bit integer arithmetic, because nearly every computer available to the author 
at the time of writing was limited in that way. Details of the basic arithmetic appear 
in a separate program module called MMIX-ARITH, because the same routines are 
needed also for the simulators. The definition of type tetra should be changed, if 
necessary, to conform with the definitions found in MMIX-ARITH. 

( Type definitions 26 ) = 

typedef unsigned int tetra; /* assumes that an int is exactly 32 bits wide */ 

typedef struct { 
tetra h, l\ 

} octa; /* two tetrabytes make one octabyte */ 

typedef enum { 
false, true 
} bool; 

See also sections 30, 54, 58, 62, 68, and 82. 

This code is used in section 136. 



27. (Global variables 27} = 

extern octa zero-Octa\ /* zero-octa.h = zero-octa.l — 0 */ 

extern octa neg^one\ /* neg-one.h = neg.one.l = — 1 */ 

extern octa aux; /* auxiliary output of a subroutine */ 

extern bool overflow, /* set by certain subroutines for signed arithmetic */ 

See also sections 33, 36, 37, 43, 46, 51, 56, 60, 63, 67, 69, 77, 83, 90, 105, 120, 133, 139, and 143. 
This code is used in section 136. 



28. Most of the subroutines in MMIX-ARITH return an octabyte as a function of 
two octabytes; for example, oplus {y, z) returns the sum of octabytes y and 2;. Division 
inputs the high half of a dividend in the global variable aux and returns the remainder 
in aux. 



j * unsigned y + z */ 

/* unsigned y — z *j 
/* unsigned y + 5 {5 is signed) */ 
!* y A z */ 

/* {/ <C s, 0 < s < 64 */ 



{ Subroutines 28 ) = 

extern octa oplus ARGS((octa j/,octa 2)); 
extern octa ominus ARGS((octa y,octa 2)); 
extern octa incr ARGS((octa y, int delta))', 
extern octa oand ARGS((octa t/,octa 2)); 
extern octa shiftjeft ARGS((octa 3/, int s)); 
extern octa shiftjnght ARGS((octa i/, int s, int u)); /* yi$>s, signed if -^u */ 

extern octa omult ARGS((octa j/,octa 2)); /* unsigned (aux,x) = y x z */ 

extern octa odiv ARGS((octa a;, octa j/, octa 2)); 

/* unsigned (x,y)jz', aux = (x,y) mod 2 */ 

See also sections 41, 42, 44, 45, 47, 48, 49, 50, 52, 55, 57, 59, 73, and 74. 

This code is used in section 136. 



ARCS = macro (), §31. 
aux: octa, MMIX-ARITH §4. 
incr: octa (), MMIX-ARITH §6. 
neg.one: octa, MMIX-ARITH §4. 
oand: octa (), 

MMIX-ARITH §25. 
odiv: octa {), MMIX-ARITH §13. 



ominus: octa (), 

MMIX-ARITH §5. 
omult: octa (), 

MMIX-ARITH §8. 

oplus: octa (), MMIX-ARITH §5. 
overflow: bool, 

MMIX-ARITH §4. 



shiftjeft: octa (), 
MMIX-ARITH §7. 
shift jright: octa (), 
MMIX-ARITH §7. 
zero.octa: octa, 
MMIX-ARITH §4. 
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29. Here’s a rudimentary check to see if arithmetic is in trouble. 

(Initialize everything 29 ) = 

acc = shiftJeft{neg.one, 1); 

if [acc.h 7^ panjc("TypeutetrauisunotuimplementeduCorrectly"); 

See also sections 32, 61, 71, 84, 91, and 140. 

This code is used in section 136. 

30. Future versions of this program will work with symbols formed from Unicode 
characters, but the present code limits itself to an 8-bit subset. The type Char is 
defined here in order to ease the later transition: At present, Char is the same as 
char, but Char can be changed to a 16-bit type in the Unicode version. 

Other changes will also be necessary when the transition to Unicode is made; for 
example, some calls of fprintf will become calls of fwprintf , and some occurrences of 
7.S will become 7.1s in print formats. The switchable type name Char provides at 
least a first step towards a brighter future with Unicode. 

( Type definitions 26 ) -|-= 

typedef char Char; /* bytes that will become wydes some day */ 

31. While we’re talking about classic systems versus future systems, we might as 
well define the ARCS macro, which makes function prototypes available on ANSI C 
systems without making them uncompilable on older systems. Each subroutine below 
is declared first with a prototype, then with an old-style definition. 

( Preprocessor definitions 31 } = 

#ifdef __STDC__ 

^define ARCS (list) list 
^else 

^define ARCS (list) () 

^endif 

See also section 39. 

This code is used in section 136. 



443 



MMIXAL: BASIC INPUT AND OUTPUT 



32. Basic input and output. Input goes into a buffer that is normally limited 
to 72 characters. This limit can be raised, by using the -b option when invoking 
the assembler; but short buffers will keep listings from becoming unwieldy, because a 
symbolic listing adds 19 characters per line. 

( Initialize everything 29 } += 
if {buf-size < 72) bufsize = 72; 

buffer = (Char *) callocibufsize + 1, sizeof(Char)); 
lab.field = (Char *) calloc{buf.size + 1, sizeof (Char)); 
op.field — (Char *) calloc{buffsize, sizeof (Char))-, 
operandjist = (Char *) calloc{buffsize, sizeof (Char)); 
err.buf — (Char *) calloc {buff size + 60, sizeof (Char)); 
if {-^buffer V -ilab.field V -lOp-field V -^operandjist V -lerr.buf) 
panic ( " Nour oomuf orutheubuf f er s " ) ; 

33. ( Global variables 27 } += 

Char * buffer; /* raw input of the current line */ 

Char *buffptr; /* current position within buffer */ 

Char *lab.field; j* copy of the label field of the current instruction */ 

Char *op.field; /* copy of the opcode field of the current instruction */ 

Char * operandjist; /* copy of the operand field of the current instruction */ 

Char ^err.buf] /* place where dynamic error messages are sprinted */ 

34. ( Get the next line of input text, or break if the input has ended 34 ) = 
if {^fgets (buffer , buf. size 1, src.file)) break; 

/me_no ++; 
lineJisted = false ; 
j = strlen (buffer); 

if (buffer [j — 1 ] = ^\n^) buffer [j — 1 ] = ^\0G /* remove the newline */ 

else if ((j = fgetc(src.file)) 7 ^ EOF) (Flush the excess part of an overlong line 35 ); 
if (buffer [0] = ’#’) (Check for a line directive 38 ); 
buffptr = buffer; 

This code is used in section 136. 



STDC , Standard C. 

acc: octa, §83. 

buffsize: int, §139. 

calloc: void *(), <stdlib.h>. 

E0F = ( — 1), <stdio.h>. 

false = 0, §26. 

fgetc: int (), <stdio.h>. 



fgets: char *(), <stdio.h>. 
fprintf: int (), <stdio.h>. 
fwprintf: int (), 

multibyte string function. 
h: tetra, §26. 
j: register int, §136. 
lineJisted: bool, §36. 



line^no: int, §36. 
neg^one: octa, mmix-ARITH §4. 
panic =macro (), §45. 
shift Jeft: octa (), 
MMIX-ARITH §7. 
src-file: FILE §139. 
strlen: size.t (), <string.h>. 
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35. (Flush the excess part of an overlong line 35) = 

{ 

while {j ^ ’\n’ Aj 7 ^ EOF) j — fgetc{src-file)\ 
if {-ilongjwaming.given) { 
longjwaming-given = true\ 

err ("*trailinguCharactersuof ulonguinputulineuhaveubeenudropped" ); 
fprintf {stderr , 

" (sayu'“bu<number>’utOuincreaseutheulengthuof uinyuinputubuff er) \n" ); 

} else err ("*trailingucharactersudropped" ); 

} 

This code is used in section 34. 

36. (Global variables 27 ) += 

int cur.file ; / * index of the current file in filename * / 

int line-no; /* current position in the file */ 

bool line-listed; /* have we listed the buffer contents? */ 

bool long-warning-given; /* have we given the hint about -b? */ 

37. We keep track of source file name and line number at all times, for error 
reporting and for synchronization data in the object file. Up to 256 different source 
file names can be remembered. 

( Global variables 27 ) += 

Char *filename[257]; /* source file names, including those in line directives */ 
int filename-Count ; / * how many filename entries have we filled? * / 

38. If the current line is a line directive, it will also be treated as a comment by the 
assembler. 

( Check for a line directive 38 ) = 

{ 

for (p = buffer + 1; isspace{*p); p++) ; 

for {j — 0; isdigit{*p); p++) y = 10 * y ’ + *p — ’ 0’ ; 

for ( ; isspace(*p); p++) ; 

if {*p = ’ \ " ’ ) { 

if {-ifilename\filename-Count]) { 

filename[filename-Count] = (Char *) ca//oc (FILENAME_MAX + 1, sizeof (Char)); 
if {-ifilename[filename-COunt]) 

panic ("Capacityusxceeded : uOutuofuf ilenameuHiemory" ); 

} 

for (p++, fc = 0; *p A*p ’\"> A k < FILENAME_MAX ; p++, k++) 
filename[filename-Count][k\ = *p; 

if {k = FILENAME_MAX) panic("Capacityuexceeded:uFileunameutooulong" ); 
if (*p = ’ A *(p — 1) 7 ^ ’\" ’ ) { /* yes, it’s a line directive */ 

filename[filename-Count][k\ — ’\0’ ; 

for {k — 0; strcmp{filename[k],filename[filename-Count])fi^O; k++) ; 
if {k = filename-COunt) { 
if {filename-Count = 256) 

panic ( "Capacityusxceeded : uMoreuthanu256uf ileunames " ) ; 
filename-Count ++ ; 

} 
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cur.file = k; 
line-no = j — 1; 

} 

} 

} 

This code is used in section 34. 

39. Archaic versions of the C library do not define FILENAME_MAX. 

( Preprocessor definitions 31 } += 

#ifndef FILENAME_MAX 
#define FILENAME_MAX 256 
^endif 

40. ( Local variables 40 } = 

register Char *p, *q\ /* the place where we’re currently scanning */ 

See also section 65. 

This code is used in section 136. 

41. The next several subroutines are useful for preparing a listing of the assembled 
results. In such a listing, which the user can request with a command line option, 
we fill the leftmost 19 columns with a representation of the output that has been 
assembled from the input in the buffer. Sometimes the assembled output requires 
more than one line, because we have room to output only a tetrabyte per line. 

The flush-listingjine subroutine is called when we have finished generating one 
line’s worth of assembled material. Its parameter is a string to be printed between the 
assembled material and the buffer contents, if the input line hasn’t yet been echoed. 
The length of this string should be 19 minus the number of characters already printed 
on the current line of the listing. 

( Subroutines 28 ) += 

void flush-listing-line ARCS ((char *)); 
void flush-listing -line (s) 

char *s; 

{ 

if (line-listed) fprintf (listing-file, "\n"); 
else { 

fprintf (listing-file, "°/,s°/,s\n" , s, buffer); 
line-listed = true; 

} 

} 



ARCS = macro (), §31. 
bool = enum, §26. 
buffer: Char §33. 
calloc: void *(), <stdlib.h>. 
Char = char, §30. 

E0F = ( — 1), <stdio.h>. 
err = macro ( ), §45. 



fgetc: int (), <stdio.h>. 
FILENAME_MAX = macro, 
<stdio.h>. 

fprintf: int (), <stdio.h>. 
isdigit: int (), <ctype.h>. 
isspace: int (), <ctype.h>. 
j: register int, §136. 



k: register int, §136. 

listing.file: FILE §139. 
panic =macro (), §45. 
src.file: FILE §139. 
stderr: FILE *, <stdio.h>. 
strcmp: int (), <string.h>. 
true = 1, §26. 
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42. Only the three least significant hex digits of a location are shown on the listing, 
unless the other digits have changed. The following subroutine prints an extra line 
when a change needs to be shown. 

( Subroutines 28 } += 

void update-listing Joe ARGS((int)); 
void update-listingJoc{k) 

int fc; /* the location to display, mod 4 */ 

{ 

if {curJoc.h ^ listingJoc.h V {{curjoc.l 0 listingjoc .1) & ^fffffOOO)) { 
fprintf {listing -file, ""/,08x"/,08x; " , curJoc.h, {curjoc.l & —4) | k)\ 
flush-listingjine ( " uu " ) i 

} 

listingJoc.h = curJoc.h-, listingjoc . I = {curjoc.l & —4) | fc; 

} 

43. (Global variables 27) += 

octa cur Joe, /* current location of assembled output */ 

octa listingjoc-, /* current location on the listing */ 

unsigned char hold-buf\A\-, /* assembled bytes */ 

unsigned char held-bits-, /* which bytes of hold-buf are active? */ 

unsigned char listing-bits-, /* which of them haven’t been listed yet? */ 

bool spec-mode-, /* are we between BSPEC and ESPEC? */ 

tetra spec-modc-loc; /* number of bytes in the current special output */ 

44. When bytes are assembled, they are placed into the hold-buf . More precisely, a 
byte assembled for a location that is j plus a multiple of 4 is placed into hold-buf \j]-, 
two auxiliary variables, held-bits and listing-bits, are then increased by 1 -C j. 
Furthermore, listing-bits is increased by *10 ^ j if that byte is a future reference to 
be resolved later. 

The bytes are held until we need to output them. The listing-dear routine lists any 
that have been held but not yet shown. It should be called only when listing-bits 0. 

( Subroutines 28 } += 

void listing-dear ARGS((void)); 
void listing -dear {) 

{ 

register int j, k\ 

for (fc = 0; fc < 4; fc-H-) 

if {listing-bits & (1 <C fc)) break; 
if {spec-mode) fprintf {listing -file," uuuuuuuuu")', 

else { 

update-listing Joe (fc) ; 

fprintf {listing -file, "u- • ."/o03x:u" , {listingjoc . I & *ffc) | fc); 

} 

for {j = 0; j < 4; j++) 

if {listing-bits & (*10 j)) fprintf {listing-file , "xx"); 

else if {listing -bits {1 ^ j)) fprintf {listing-file, "7>02x" , hold-buf [j])-, 
else fprintf {listing -file, "uu"); 
flush-listingjine ( " uu " ) ; 



447 



MMIXAL: BASIC INPUT AND OUTPUT 



listingj)its = 0 ; 

} 



ARCS = macro (), §31. 
bool = enum, §26. 
BSPEC = =^104, §62. 
ESPEC = =^105, §62. 



flushJistingJine: void (), §41. 
fprintf: int (), <stdio.h>. 
h: tetra, §26. 

1: tetra, §26. 



listing.file: FILE §139. 
octa = struct, §26. 
tetra = unsigned int, §26. 
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45. Error messages are written to stderr. If the message begins with it is merely 
a warning; if it begins with ‘ ! ’ it is fatal; otherwise the error is probably serious enough 
to make manual correction necessary, yet it is not tragic. Errors and warnings appear 
also on the optional listing file. 

^define err (m) 

{ report.error (m)-, if (m[ 0 ] 7 ^ ’*’) goto bypass; } 

T^define derr{m,p) 

{ sprmtf {err.buf ,m,p); 

report.error{errJ)uf); if (err_6it/[0] 7 ^ ’*’) goto bypass; } 

T^define dderr (m, p, q) 

{ sprintf [errjiuf ,m,p,q); 

report. error {err. buf); if {err.buf[0] 7 ^ ’*’) goto bypass; } 

T^define panic{m) 

{ sprintf {err.buf , " \7,s" ,m); report.error{err.buf); } 

T^define dpanic{m,p) 

{ err.buf [0] = ’ \ ; sprintf {err.buf l,m,p); report.error{err.buf); } 

{ Subroutines 28 } += 

void report.error ARGS((char *)); 
void report.error {message) 
char * message; 

{ 

if {-I filename [cur. file]) filename[cur. file] = " (.nofile) " ; 
if (message [0] = ’*’) fprintf {stderr , "\"‘/,s\" ,ulineu7Au^nrning: u7<,s\n" , 
filename[cur.file], line.no, message + 1 ); 

else if (message[0]= ’!’) /pnnt/(stderr, "\"°/.s\" .ulineu’/.dufataluerror :u’/.s\n" , 
filename[cur.file], line.no, message + 1 ); 

else { 

fprintf {stderr , "\""/.s\" ,ulineu’/.d: u’/.s ! \n" , filename [cur. file], line.no, message); 
err.count++; 

} 

if {listing.file) { 

if {-iline.listed) flush.listing.line{"******************u"); 
if (message [0] = ’*’) 

fprintf {listing.file , "************uwarning: u°/.s\n" , messoge + 1 ); 
else if (message [0] = ’ ! ’) 

fprintf {listing.file, "********uf ataluerror : u°/.s ! \n" , message + 1 ); 
else fprintf {listing.file , "**********uerror : u°/.s ! \n" , message); 

} 

if (message [0] = ’ ! ’) exit {—2); 

} 

46. (Global variables 27 ) += 

int err.count; /* this many errors were found */ 

47. Output to the binary ohj.file occurs four bytes at a time. The bytes are 
assembled in small buffers, not output as single tetrabytes, because we want the output 
to be big-endian even when the assembler is running on a little-endian machine. 
T^tdeflne mmo.write ( buf ) 

if {fwrite{buf ,1,4, obj.file) 7 ^ 4 ) dpanic {"Can’ tu^riteuonu7ts" , obj.file.name) 
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{ Subroutines 28 ) += 

void mmo.clear ARGS((void)); 
void mmo-out ARCS ((void)); 

unsigned char lop.quote-C07nmand[4] = {mm, lop-quote, 0,1}; 
void mmo-clear{) j* clears hold.buf , when held.bits 7^ 0 */ 
{ 

if {hold.buf[0] = mm) mmo^write{lop.quote^command); 
mmo. write (hold.buf); 

if {listing Jile A listing J)its) listing_clear{); 
heldj)its = 0; 

holdj)uf\0] = holdJ}uf[l] = /ioM_6m/[2] = hold_buf[3] = 0; 
mmo.curJoc = incr{mmo.curJoc,4); mmo.curJoc.l &= —4; 
if ( mmoJinejno) mmoJ,inejno++; 

} 

unsigned char mmo.buf[4]; 
int mmo^ptr; 

void mmo^out ( ) /* output the contents of mmoJ)uf * / 

{ 

if (held.bits) mmo.clear{); 
mmo -Write ( mmo-buf ) ; 

} 



ARCS = macro ( ), §31. 
bypass: label, §102. 
cur. file: int, §36. 
err.buf: Char =t=, §33. 
exit: void (), <stdlib.h>. 
filename: Char *[], §37. 
flush.listing.line: void {), §41. 
fprintf: int (), <stdio.h>. 
fwrite: size.t (), <stdio.h>. 
held.bits: unsigned char, §43. 



hold.buf: unsigned char [], 

§43. 

incr: octa (), mmix-ARITH §6. 
1 : tetra, §26. 
line.listed: bool, §36. 
line.no: int, §36. 
listing.bits: unsigned char, 
§43. 

listing.clear: void (), §44. 



listing.file: FILE §139. 
lop.quote = §24. 

mm = "^98, §22. 
mmo.cur.loc: octa, §51. 
mmo.line.no: int, §51. 
obj.file: FILE *, §139. 
obj.file.name: char [], §139. 
sprintf: int (), <stdio.h>. 
stderr: FILE *, <stdio.h>. 
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48. ( Subroutines 28 ) += 
void mmo.tetra ARGS((tetra)); 

void mmoJjyte ARCS ((unsigned char)); 

void mmoJop ARCS ((char, unsigned char , unsigned char)); 
void mmojopp ARCS ((char, unsigned short)); 

void mmoJ,etra(t) /* output a tetrabyte */ 

tetra t; 

{ 

mmoJjuf [Q] — 24; mmo-buf [1] = (t ^ 16) & *ff ; 

mmo_6u/[2] — 8) *ff ; mmo.buf [3] = t & *ff; 

mmo_out{); 

} 

void mmoJyyte{h) 

unsigned char 6; 

{ 

mmoJ)uf [{mmo.ptr ++) &i3] — b\ 
if [-i{mmojptr & 3)) mmo.out{ ); 

} 

void mmoJop{x,y, z) /* output a loader operation */ 

char x; 

unsigned char y, z\ 

{ 

mmo.buf[0] — mm; mmo.buf [1] = x; mmo_buf[2] = y; mmo.buf[3] = z; 
mmo_out { ); 

} 

void mmoJoppix, yz) /* output a loader operation with two-byte operand */ 

char x\ 

unsigned short yz\ 

{ 

mmo.buf [0] — mm; mmo-buf [1] = x; mmo.buf [2] = yz ^ 8; mmo-buf[3] = yz &*ff; 
mmo.out { ); 

} 

49. The mmoJoc subroutine makes the current location in the object file equal to 
cur Joe . 

{ Subroutines 28 } -|-= 

void mmoJoc ARGS((void)); 
void mmoJoc ( ) 

{ 

octa o; 

if (heldjits) mmo-clear{)-, 
o — ominus{curJoc, mmo.curjoc); 
if {o.h = 0 A o.Z < *10000) { 
if (o.l) mmoJopp (lop. skip, o.l)\ 

} else { 

if (cur.Zoc./i & ) { 

mmoJop(lopJoc,0, 2); 
mmo. tetra ( cur Joe .h ) ; 
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} else mmoJop{lopJoc, cur Joe. h ^ 24,1); 
mmoJetra ( cur Joe .1); 

} 

mmo-curjoc = cur Joe; 

} 

50. Similarly, the mmosync subroutine makes sure that the current file and line 
number in the output file agree with cur.file and line-no . 

{ Subroutines 28 ) += 

void mmo-sync ARGS((void)); 
void mmo-sync ( ) 

{ 

register int j; 

register unsigned char *p; 

if {cur-file mmo-cur-file) { 

if {filenamc-passed[cur-file]) mmo Jop {lop -file, cur -file, G); 
else { 

mmo -lop {lop -file, cur-file, {strlen{filename[cur-file\) + 3) 2); 

for {j = 0,p = filename[cur-file\; *p; p++,j = {j + 1) & 3) { 
mmo-buf[j] = *p; 
if {j = 3) mmo-out{); 

} 

if (i) { 

for {; j <4; j++) mmo-buf[j] = 0; 
mmo-Out { ); 

} 

filename-passed[cur-file] = 1; 

} 

mmo-Cur-file = cur-file; 
mmo-line-no = 0 ; 

} 

if {line-no mmojine-no) { 
if {line-no > *10000) 

panic ( " luCan ’ t udealuwithul ineunumber Suexceedingu65535 " ) ; 
mmo Jopp {lop -line, line-no); 
mmoJine-no — line-no; 

} 

} 



ARCS = macro ( ), §31. 
cur^file: int, §36. 
curJoc: octa, §43. 
filename: Char *[], §37. 
filename.passed: char [], §51. 
h: tetra, §26. 

held^bits: unsigned char, §43. 
1: tetra, §26. 
linejno: int, §36. 
lop^file = ^6, §24. 



lopjine = *7^ §24. 
lopjoc = "^1, §24. 
lop^skip = *2^ §24. 
mm = ^98, §22. 
mmo.buf : unsigned char [], 
§47. 

mmo.clear: void (), §47. 
mmo-cur-file: int, §51. 
mmo-curjoc: octa, §51. 



mmo-line-no: int, §51. 
mmo-out: void (), §47. 
mmo-ptr: int, §47. 
octa = struct, §26. 
ominus: octa (), 
MMIX-ARITH §5. 
panic = macro (), §45. 
strlen: size.t (), <string.h>. 
tetra = unsigned int, §26. 
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51. (Global variables 27) += 

octa mmo.curJoc; /* current location in the object file */ 
int mmojine-no; /* current line number in the mmo output so far */ 

int mmo.cur.file; /* index of the current file in the mmo output so far */ 

char filename.passed [256]; /* has a filename been recorded in the output? */ 

52. Here is a basic subroutine that assembles k bytes starting at cur doc. The value 
of k should be 1, 2, or 4, and curJoc should be a multiple of k. The xJjits parameter 
tells which bytes, if any, are part of a future reference. 

( Subroutines 28 } += 

void assemble ARCS ((char, tetra, unsigned char)); 
void assemble {k, dat ,xj)its) 
char k; 
tetra dat; 

unsigned char xdits; 

{ 

register int j, jj , 1; 

if [specjmode) I = spec.modeJoc; 
else { 

I = eurdoe.l; 

{ Make sure cur doc and mmo.curdoe refer to the same tetrabyte 53 ) ; 
if (-ihelddits A-i{curdoc.h &c*e0000000)) mmosync{); 

} 

for (j = 0; j < k; j++) { 

JJ = (l + i)&3; 

holddufljj] = [dat 3 > (8 * (fc — 1 — j))) & *ff ; 
helddits 1 = 1 ^ jj; 
listingdits |= 1 <C jj; 

} 

listingdits \ = xdits; 
if {{{l + k)k3) = 0) { 

if (listing^file) listing ^elear[); 
mmo-clear{); 

} 

if [specjmode) spec.modedoc += k; 
else curdoc = incr(curdoc,k); 

} 

53. (Make sure curdoc and mmo.curdoc refer to the same tetrabyte 53) = 

if {cur Joe. h 7 ^ mmo.curJoc.h V {{cur Joe. I 0 mmo.curJoc.l) & ff ff f c)) mmo Joe { ); 

This code is used in section 52. 
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54. The symbol table. Symbols are stored and retrieved by means of a ternary 
search trie, following ideas of Bentley and Sedgewick. (See ACM-SIAM Symp. on 
Discrete Algorithms 8 (1997), 360-369; R. Sedgewick, Algorithms in C (Reading, 
Mass.: Addison-Wesley, 1998), §15.4.) Each trie node stores a character, and there 
are branches to subtries for the cases where a given character is less than, equal to, 
or greater than the character in the trie. There also is a pointer to a symbol table 
entry if a symbol ends at the current node. 

( Type definitions 26 ) += 

typedef struct ternary_trie_struct { 

unsigned short ch\ /* the (possibly wyde) character stored here */ 
struct ternary_trie_struct *mid, *right\ 

/* downward in the ternary trie * / 
struct sym_tab_struct *sym\ /* equivalents of symbols */ 

} trie_node; 

55. We allocate trie nodes in chunks of 1000 at a time. 

( Subroutines 28 ) += 

trie_node *new.trie.node ARGS((void)); 
trie_node *newJ,riejnode{) 



register trie_node = next.trie.node; 
if {t = lastJ,riejaode) { 

t= (trie_node *) ca//oc(1000, sizeof (trie_node)); 

if (-if) panic("Capacityuexceeded: uOutuof utrieuinemory" ); 

last.trie-node = t+ 1000; 



{ 



} 

nextJ,riejnode = t + 1; 



return t; 

} 



56 . (Global variables 27) += 



trie.node *trie.root; /* root of the trie */ 
trie.node *op.root; /* root of subtrie for opcodes */ 
trie.node ^next.trie.node, ^last.trie.node; /* allocation control */ 
trie.node ^cur.prefix; /* root of subtrie for unqualified symbols */ 



ARCS = macro (), §31. 
calloc: void *(), <stdlib.h>. 
curJoc: octa, §43. 
h: tetra, §26. 

held^bits: unsigned char, §43. 
hold^buf: unsigned char [], 



1: tetra, §26. 



listing^cleav. void (), §44. 
listing^file: FILE =t=, §139. 
mmo^clear: void (), §47. 
mmoJoc: void (), §49. 



listing^bits: unsigned char. 



i43. 



mmo^sync: void (), §50. 
octa = struct, §26. 
panic =macro (), §45. 
spec.mode: bool, §43. 
spec.modeJ.oc: tetra, §43. 
sym_tab_struct : struct, §58. 
tetra = unsigned int, §26. 



i43. 



incr: octa (), mmix-ARITH §6. 
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57. The triesearch subroutine starts at a given node of the trie and finds a given 
string in its middle subtrie, inserting new nodes if necessary. The string ends with 
the first nonletter or nondigit; the location of the terminating character is stored in 
global variable terminator . 

^define isletter[c) {isalpha{c) V c = V c = ’ : ’ V (unsigned int)(c) > 126) 

( Subroutines 28 } += 

trie_node *trie^search ARGS((trie_node *,Char *)); 

Char ^terminator ■, /* where the search ended */ 

trie.node *trie_search{t, s) 

trie_node *t; 

Char 

{ 

register trie_node *tt = t; 

register unsigned char *p — (unsigned char *) s; 
while (1) { 

if {-iisletter(*p) /\ -'isdigit{*p)) { 

terminator = (Char *) p; return tt\ 

} 

if {tt-mid) { 
tt = tt->mid\ 
while [*p 7 ^ tt-ch) { 
if {*p < tt^ch) { 

if (ttMeft) tt = tt^left', 
else { 

tt-^left — new.trie.node{)-, tt = ttHeft-, goto store.new-char; 

} 

} else { 

if {upright ) tt = Upright \ 
else { 

Upright = new.trie.node{)-, tt = tt->right; goto store.new-char; 

} 

} 

} 

P++; 

} else { 

tt^mid = newHriemode{); tt = tt^mid\ 
storemew^char: tt~>ch = *p++; 

} 

} 

} 

58. Symbol table nodes hold the serial numbers and equivalents of defined symbols. 
They also hold “fixup information” for undefined symbols; this will allow the loader 
to correct any previously assembled instructions that refer to such symbols when they 
are eventually defined. 

In the symbol table node for a defined symbol, the link field has one of the special 
codes DEFINED or REGISTER or PREDEFINED, and the equiv field holds the defined 
value. The serial number is a unique identifier for all user-defined symbols. 
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In the symbol table node for an undefined symbol, the eguiv field is ignored. The 
link field points to the first node of fixup information; that node is, in turn, a symbol 
table node that might link to other fixups. The serial number in a fixup node is 
either 0 or 1 or 2, meaning respectively “fixnp the octabyte pointed to by equiv” or 
“fixup the relative address in the YZ field of the instruction pointed to by equiv” or 
“fixup the relative address in the XYZ field of the instruction pointed to by equiv.” 

^define DEFINED (sym_node *) 1 /* code value for octabyte equivalents */ 

T^tdefine REGISTER (sym_node *) 2 /* code value for register-number equivalents */ 

^define PREDEFINED (sym_node *) 3 /* code value for not-yet-used equivalents */ 

T^tdefine fix-O 0 /* serial code for octabyte fixup */ 

^define fix.yz 1 /* serial code for relative fixup */ 

^define fix-xyz 2 /* serial code for IMP fixup */ 

( Type definitions 26 ) -|-= 

typedef struct sym.tab.struct { 

int serial', /* serial number of symbol; type number for fixups */ 
struct sym_tab_struct *link; /* DEFINED status or link to fixup */ 
octa equiv, /* the equivalent value */ 

} sym_node; 



ARCS = macro (), §31. 
ch: unsigned short, §54. 
Char = char, §30. 
isalpha-. int {), <ctype.h>. 



isdigit: int {), <ctype.h>. §55. 

left: trie_node *, §54. octa = struct, §26. 

mid: trie_node *, §54. right: trie.node *, §54. 

newJ.rie.node: trie.node *(), trie.node = strnct, §54. 
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59. The allocation of new symbol table nodes proceeds in chunks, like the allocation 
of trie nodes. But in this case we also have the possibility of reusing old fixup nodes 
that are no longer needed. 

T^tdefine recycle.fixup{pp) ppMink = sym.avail , sym.avail = pp 

( Subroutines 28 ) += 

sym_node *new.sym.node ARCS ((bool)); 
sym.node * new.symjnode {serialize ) 

bool serialize; /* should the new node receive a unique serial number? */ 

{ 

register sym_node *p = sym^avail ; 

if (P) { 

sym.avail = pr-link; p-link = A; p-serial — 0; pr-equiv = zero.octa; 

} else { 

p — next-sym.node; 
if (p = lasEsymjtiode ) { 

p= (sym.node *) m//oc(1000, sizeof (sym.node)); 

if (^p) pamc("Capacityuexceeded:uOutuofusymbolumemory"); 

last.sym.node = p + 1000; 

} 

nexEsym.node = p + 1; 

} 

if {serialize) p-^serial = ++ serialjnumber ; 

return p; 

} 

60. ( Global variables 27 ) += 
int seriaLnumber; 

sym.node *symjroot; /* root of the sym */ 

sym.node * next. sym.node, *last.sym.node; j* allocation control */ 
sym.node *sym.avail; /* stack of recycled symbol table nodes */ 

61. We initialize the trie by inserting all the predefined symbols. Opcodes are given 
the prefix ~ , to distinguish them from ordinary symbols; this character nicely divides 
uppercase letters from lowercase letters. 

( Initialize everything 29 ) += 
trie.root — new.trie.node{); 
cur. prefix = trie.root; 
op. root = new.trie.node{); 
trie.root-*mid = op. root; 
trie.root-ch = 
op.root~>ch = ’ “ ’ ; 

(Put the MMIX opcodes and MMIXAL pseudo-ops into the trie 64); 

( Put the special register names into the trie 66 ) ; 

( Put other predefined symbols into the trie 70 ) ; 
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62 . Most of the assembly work can be table driven, based on bits that are stored 
as the “equivalents” of opcode symbols like "ADD. 



95^:deflne 

T^tdeflne 

^define 

^define 

^define 

95^:deflne 

^define 

^define 

T^tdeflne 

^define 

95^:deflne 

^define 

95^:deflne 

T^tdeflne 

^define 

T^tdeflne 

^define 

T^tdeflne 

^define 

^define 



reLaddr.bit * 1 



immed.bit 
zar.bit *4 
zr.bit 
yar.bit 
yr.bit 
xarj)it 
xrj)it 
yzar.bit 
yzrj)it 
xyzarjiit 
xyzr.bit 
one.argJ)it *1000 
two.arg.bit *2000 
three.arg.bit *4000 
many-arg.bit *8000 
aUgn.bits *30000 
noJabeLbit *40000 
mem.bit *80000 
specJ)it *100000 



*/ 



*/ 



/* is YZ or XYZ relative? */ 

/* should opcode be immediate if Z or YZ not register? 

/* should register status of Z be ignored? */ 

/* must Z be a register? */ 

/* should register status of Y be ignored? */ 

/* must Y be a register? */ 

/* should register status of X be ignored? */ 

/* must X be a register? */ 

/* should register status of YZ be ignored? 

/* must YZ be a register? */ 

400 /* should register status of XYZ be ignored? 

800 /* must XYZ be a register? */ 

/* is it OK to have zero or one operand? */ 

/* is it OK to have exactly two operands? */ 

/* is it OK to have exactly three operands? */ 

/* is it OK to have more than three operands? */ 

/* how much alignment: byte, wyde, tetra, or octa? */ 
/* should the label be blank? */ 

/* must YZ be a memory reference? */ 

/* is this opcode allowed in SPEC mode? */ 



*8 

*10 

*20 

*40 

*80 

*100 

*200 
# 

# 






( Type definitions 26 ) += 

typedef struct { 

Char *name; /* symbolic opcode */ 
short code\ /* numeric opcode */ 
int bits; /* treatment of operands */ 

} op.spec; 



typedef enum { 

SET = ^100, IS, LOG, PREFIX, BSPEC,ESPEC, GREG, LOCAL, 
BYTE, WYDE, TETRA, OCTA 
} pseudo.op; 



ARCS = macro (), §31. 
bool = enum, §26. 
calloc: void *(), <stdlib.h>. 
ch: unsigned short, §54. 
Char = char, §30. 
curjprefix-. trie_node §56. 



equiv: octa, §58. 
link: sym.node *, §58. 
mid: trie_node *, §54. 
newJ.rie.node: trie.node *(), 
§55. 

op.root: trie.node §56. 



panic = macro (), §45. 
serial: int, §58. 
sym.node = struct, §58. 
triejroot: trie.node *, §56. 
zero.octa: octa, 
MMIX-ARITH §4. 
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63. (Global variables 27 ) += 
op_spec op^iniEtable[] = { 

{"TRAP" , *00, *27554}, ("FCMP" , *01, *240a8}, 
{"FUN" , *02, *240a8}, {"FEqL" , *03, *240a8}, 
{"FADD" , *04, *240a8}, {"FIX" , *05, *26288}, 
{"FSUB" , *06, *240a8}, {"FIXU" , *07, *26288}, 
{"FLOT" , *08, *26282}, {"FLOTU" , *0a, *26282}, 
{"SFLOT" , *0c, *26282}, {"SFLOTU" , *0e, *26282}, 
{"FMUL" , *10, *240a8}, {"FCMPE" , *11, *240a8}, 
{"FUNE" , *12, *240a8}, {"FEQLE" , *13, *240a8}, 
{"FDIV" , *14, *240a8}, {"FSQRT" , *15, *26288}, 
{"FREM" , *16, *240a8}, {"FINT" , *17, *26288}, 
{"MUL" , *18, *240a2}, {"MULU" , *la, *240a2}, 
{"DIV" , *lc, *240a2}, {"DIVU" , *le, *240a2}, 
{"ADD" , *20, *240a2}, {"ADDU" , *22, *240a2}, 
{"SUB" , *24, *240a2}, {"SUBU" , *26, *240a2}, 
{"2ADDU" , *28, *240a2}, {"4ADDU" , *2a, *240a2}, 
{"8ADDU" , *2c, *240a2}, {"16ADDU" , *2e, *240a2}, 
{"CMP" , *30, *240a2}, {"CMPU" , *32, *240a2}, 
{"NEC" , *34, *26082}, {"NEGU" , *36, *26082}, 
{"SL" , *38, *240a2}, {"SLU" , *3a, *240a2}, 

{"SR" , *3c, *240a2}, {"SRU" , *3e, *240a2}, 

{"BN" , *40, *22081}, {"BZ" , *42, *22081}, 

{"BP" , *44, *22081}, {"BOD" , *46, *22081}, 

{"BNN" , *48, *22081}, {"BNZ" , *4a, *22081}, 
{"BNP" , *4c, *22081}, {"BEV" , *4e, *22081}, 
{"PBN" , *50, *22081}, {"PBZ" , *52, *22081}, 
{"PBP" , *54, *22081}, {"PBOD" , *56, *22081}, 
{"PBNN" , *58, *22081}, {"PBNZ" , *5a, *22081}, 
{"PBNP" , *5c, *22081}, {"PBEV" , *5e, *22081}, 
{"CSN" , *60, *240a2}, {"CSZ" , *62, *240a2}, 
{"CSP" , *64, *240a2}, {"CSOD" , *66, *240a2}, 
{"CSNN" , *68, *240a2}, {"CSNZ" , *6a, *240a2}, 
{"CSNP" , *6c, *240a2}, {"CSEV" , *6e, *240a2}, 
{"ZSN" , *70, *240a2}, {"ZSZ" , *72, *240a2}, 
{"ZSP" , *74, *240a2}, {"ZSOD" , *76, *240a2}, 
{"ZSNN" , *78, *240a2}, {"ZSNZ" , *7a, *240a2}, 
{"ZSNP" , *7c, *240a2}, {"ZSEV" , *7e, *240a2}, 
{"LDB" , *80, *a60a2}, {"LDBU" , *82, *a60a2}, 
{"LDW" , *84, *a60a2}, {"LDWU" , *86, *a60a2}, 
{"LDT" , *88, *a60a2}, {"LDTU" , *8a, *a60a2}, 
{"EDO" , *8c, *a60a2}, {"LDOU" , *8e, *a60a2}, 
{"LDSF" , *90, *a60a2}, {"LDHT" , *92, *a60a2}, 
{"CSWAP" , *94, *a60a2}, {"LDUNC" , *96, *a60a2}, 
{"LDVTS" , *98, *a60a2}, {"PRELD" , *9a, *a6022}, 
{"PREGO" , *9c, *a6022}, {"GO" , *9e, *a60a2}, 
{"STB" , *a0, *a60a2}, {"STBU" , *a2, *a60a2}, 
{"STW" , *a4, *a60a2}, {"STWU" , *a6, *a60a2}, 
{"STT" , *a8, *a60a2}, {"STTU" , *aa, *a60a2}. 
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{"STD" , *ac, ’^a60a2}, {"STOU" , *ae, ’^a60a2}, 

{"STSF" , *b0, *a60a2}, {"STHT" , *b2, *a60a2}, 

{"STCO" , *b4, *a6022}, {"STUNC" , *b6, *a60a2}, 

{"SYNCD" , *b8, *a6022}, {"PREST" , *ba, *a6022}, 

{"SYNCID" , *hc, *a6022}, {"PUSHGO" , *be, ’^a6062}, 

{"OR" , *cO, *240a2}, ("ORN" , *c2, *240a2}, 

{"NOR" , *c4, ’^240a2}, {"XOR" , *c6, *240a2}, 

{"AND" , *c8, *240a2}, {"ANDN" , *ca, *240a2}, 

{"NAND" ,*cc, *240a2}, {"NXOR" ,*ce, *240a2}, 

{"BDIF" , *dO, *240a2}, {"WDIF" , *d2, *240a2}, 

{"TDIF" , *d4, *240a2}, {"ODIF" , *d6, ’^240a2}, 

{"MUX" , *d8, ’^240a2}, {"SADD" , *da, ’^240a2}, 

{"MOR" , *dc, ’^240a2}, {"MXOR" , *de, ’^240a2}, 

{"SETH" , *eO, *22080}, {"SETMH" , *el, *22080}, 

{"SETML" , *e2, *22080}, {"SETL" , *e3, *22080}, 

{"INCH" , *e4, *22080}, {"INCMH" , *e5, *22080}, 

{"INCML" , *e6, *22080}, {"INCL" , *e7, *22080}, 

{"ORH" , *e8, *22080}, {"ORMH" , *e9, *22080}, 

{"ORML" , *ea, *22080}, {"ORL" , *eb, *22080}, 

{"ANDNH" , *ec, *22080}, {"ANDNMH" , *ed, *22080}, 
{"ANDNML" , *ee, *22080}, {"ANDNL" , *ef , *22080}, 

{" JMP" , *f0, *21001}, {"PUSHJ" , *f2, *22041}, 

{"GETA" , *f4, *22081}, {"PUT" , *f6, *22002}, 

{"POP" , *f8, *23000}, {"RESUME" , *f9, *21000}, 

{"SAVE" , *f a, *22080}, {"UNSAVE" , *fb, *23a00}, 

{"SYNC" , *fc, *21000}, {"SWYM" , *fd, *27554}, 

{"GET" , *fe, *22080}, {"TRIP" , *ff , *27554}, 

{"SET" , SET, *22180}, {"LDA" , *22, *a60a2}, 

{ " IS" , IS , * 101400} , { "LOG " , LOG , * 1400}, 

{ "PREFIX" , PREFIX , * 141000} , 

{"BYTE" , BYTE, *101000}, {"WYDE" , WYDE, *llf000}, 

{"TETRA" , TETRA, *12f000}, {"OCTA" , OCTA, *13f000}, 
{"BSPEC" , BSPEC, *41400}, {"ESPEC" , ESPEC, *141000}, 
{"GREG" , GREG, *101000}, {"LOCAL" , LOCAL, *141800}}; 
int opJ.nit.size ; /* the number of items in opjnitjable 



BSPEC = *104, §62. 
BYTE = *108, §62. 
ESPEC = #105, §62. 
GREG = *106, §62. 
IS = *101, §62. 



LOG = *102, §62. 

LOCAL = *107, §62. 

OCTA = # 10b, §62. 
op_spec = struct, §62. 



PREFIX = *103, §62. 
SET = *100, §62. 
TETRA = # 10a, §62. 
WYDE = *109, §62. 
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64 . ( Put the MMIX opcodes and MMIXAL pseudo-ops into the trie 64 ) = 
op-init.size = (sizeof op.iniLtable) / sizeof (op_spec); 

for (j = 0; j < opJnit-size-, j++) { 

tt = triesearch{op-root , opjinitHable[j\.name)\ 
pp = tt^sym = new.sym.node {false); 
ppMink — PREDEFINED; 

pp^equiv .h = op.initHable[j].code , pp-equiv .1 = op.init.table[j].bits; 

} 

This code is used in section 61. 

65 . (Local variables 40 ) -|-= 

register trie.node *tt; 
register sym_node *pp, *qq; 

66 . (Put the special register names into the trie 66} = 
for {j = 0; j < 32; j-H-) { 

tt — triesearch{trie-root, speciaLname[j]); 
pp = tt^sym = new.symjnode {false); 
pp-link — PREDEFINED; 
pp^equiv .1 = j; 

} 

This code is used in section 61. 



67 . ( Global variables 27 ) -l-= 



Char *speciaLname[32] = {"rB", 
"rO", "rS", "rl", "rT", "rTT" 
"rW", "rX", "rY", "rZ", "rWW" 



"rD", "rE", "rH" , "rJ", "rM", "rR", "rBB", "rC", "rN", 
, "rK" , "rQ" , "rU" , "rV" , "rG" , "rL" , "rA" , "rF" , "rP" , 
, "rXX", "rYY", "rZZ"}; 



68. (Type definitions 26 ) -|-= 

typedef struct { 

Char *name; 
tetra h, 1; 

} predef_spec; 



69. ( Global variables 27 ) -l-= 

predef.spec predefs[] = {{"R0UND_CURRENT" , 0, 0}, {"R0UND_0FF" , 0, 1}, {"R0UND_UP" , 0, 
2}, {"R0UND_D0WN" , 0, 3}, {"R0UND_NEAR" , 0, 4}, 

{"Inf ",*7ff 00000,0}, 

{"Data_Segment" , *20000000, 0}, {"PooI_Segment" , *40000000, 0}, {"Stack_Segment" , 
*60000000,0}, 

|"D_BIT" , 0, *80}, |"V_BIT" , 0, *40}, ("W_BIT" , 0, *20}, {"I_BIT" , 0, *10}, ("Q_BIT" , 0, 
*08}, |"U_BIT" , 0, *04}, |"Z_BIT" , 0, *02}, ("X_BIT" , 0, *01}, 

("D_Handler" , 0, *10}, |"V_Handler" , 0, *20}, {"W_Handler" , 0, *30}, ("I .Handler" , 0, 
*40}, {"O.Handler" , 0, *50}, ("U.Handler" , 0, *60}, {"Z.Handler" , 0, *70}, 
{"X_Handler",0, *80}, 

{"Stdin" , 0, 0}, {"StdOut" , 0, 1}, ("StdErr" , 0, 2}, 

{"TextRead" , 0, 0}, {"TextWrite" , 0, 1}, {"BinaryRead" , 0, 2}, {"BinaryWrite" , 0, 3}, 

{ "BinaryReadWrite" , 0, 4}, 

("Halt",0,0},{"Fopen",0, 1}, ("Fclose" , 0, 2}, ("Fread" , 0, 3}, {"Fgets" , 0, 4}, 

{"Fgetws" , 0, 5}, {"Fwrite" , 0, 6}, ("Fputs" , 0, 7}, {"Fputws" , 0, 8}, ("Fseek" , 0, 9}, 
{"Ftell",0, 10}}; 

int predef.size; 
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70 . { Put other predefined symbols into the trie 70 ) = 
predef.size = (sizeof prede/s )/sizeof(predef_spec); 
for (j = 0; j < predef-size; j++) { 

tt = triesearch ( triejroot , predefs \j] . name ) ; 
pp = tt-’sym = ne'w.symjnode {false)-, 
ppMink = PREDEFINED; 

pp-’eqmv .h = predefs [j].h, pp^equiv . I = predefs [j]. I 

} 

This code is used in section 61. 

71. We place Main into the trie at the beginning of assembly, so that it will show 
up as an undefined symbol if the user specifies no starting point. 

( Initialize everything 29 ) += 

trie.search{trie.root , "Main")-*5ym = new.sym.node{true); 



bits: int, §62. 

Char = char, §30. 
code: short, §62. 
equiv: octa, §58. 
false = 0, §26. 
h: tetra, §26. 
j] register int, §136. 

1: tetra, §26. 

link: sym.node *, §58. 



name: Char *, §62. 
new.sym.node: sym.node 
*0, §59. 

op^init^size: int, §63. 
op^init^table: op.spec [], §63. 
op^root: trie.node *, §56. 
op_spec = struct, §62. 
PREDEFINED = macro, §58. 



sym: sym.node *, §54. 
sym.node = struct, §58. 
tetra = unsigned int, §26. 
trie.node = struct, §54. 
triejroot: trie.node *, §56. 
trie^search: trie.node *(), 
§57. 

true = 1, §26. 
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72 . At the end of assembly we traverse the entire symbol table, visiting each symbol 
in lexicographic order and transmitting the trie structure to the output file. We detect 
any undefined future references at this time. 

The order of traversal has a simple recursive pattern; To traverse the subtrie rooted 
at t, we 

traverse t-left , if the left subtrie is nonempty; 
visit t-'sym, if this symbol table entry is present; 
traverse t-mid, if the middle subtrie is nonempty; 
traverse t^right, if the right subtrie is nonempty. 

This pattern leads to a compact representation in the mmo file, usually requiring fewer 
than two bytes per trie node plus the bytes needed to encode the equivalents and 
serial numbers. Each node of the trie is encoded as a “master byte” followed by the 
encodings of the left subtrie, character, equivalent, middle subtrie, and right subtrie. 
The master byte is the sum of 

’^80, if the character occupies two bytes instead of one; 

’^40, if the left subtrie is nonempty; 

*20, if the middle subtrie is nonempty; 

*10, if the right subtrie is nonempty; 

*01 to *08, if the symbol’s equivalent is one to eight bytes long; 

*09 to *0e, if the symbol’s equivalent is 2®^ plus one to six bytes; 

*0f , if the symbol’s equivalent is $0 plus one byte; 

the character is omitted if the middle subtrie and the equivalent are both empty. The 
“equivalent” of an undefined symbol is zero, but stated as two bytes long. Symbol 
equivalents are followed by the serial number, represented as a sequence of one or 
more bytes in radix 128; the final byte of the serial number is tagged by adding 128. 
(Thus, serial number 2^^ — 1 is encoded as *7fff; serial number 2^“^ is *010080.) 

73 . First we prune the trie by removing all predefined symbols that the user did 
not redefine. 

( Subroutines 28 } += 

trie_node *prune ARGS((trie_node *)); 
trie.node *prune{t) 
trie_node *t; 

{ 

register int useful — 0; 
if (t-'sym) { 

if (t-'sym-' serial) useful = 1; 
else t-'sym = A; 

} 

if (t-'left) { 

t-'left = prune (t-left); 
if (t-left) useful — 1; 

} 
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if {t-mid) { 

t-^mid = prune (t-mid)-, 
if it-'mid) useful = 1; 

} 

if (t^right) { 

t-‘right = prune {t-right)\ 
if {t-right) useful = 1; 

} 

if (useful) return f; 
else return A; 

} 

74. Then we output the trie by following the recursive traversal pattern. 

( Subroutines 28 ) += 

void out.stab ARGS((trie_node *)); 
void out.stab(t) 

trie_node *t; 

{ 

register int m = 0, j; 
register sym_node *pp- 
if (t-ch > *ff) m += *80; 
if (t-’left) m += *40; 
if (t-^mid) m += *20; 
if (t^right) m += *10; 
if (t-sym) { 

if (t-^sym-^link = REGISTER) m += *f ; 

else if (t-sym-link = DEFINED) (Encode the length of t->sym->equiv 76) 
else if (t-'sym-'link V t^sym-* serial = 1 ) (Report an undefined symbol 79); 

} 

mmoJ)yte{m)-, 

if (t-left) ouLstab(t-‘left); 

if (m & ) ( Visit t and traverse t~*mid 75 ); 

if (t-^right) out. stab {t~*right)\ 

} 



ARCS = macro (), §31. 
ch: unsigned short, §54. 
DEFINED = macro, §58. 
equiv: octa, §58. 
left: trie_node *, §54. 



link: sym.node *, §58. 
mid: trie_node *, §54. 
mmo.byte: void (), §48. 
REGISTER = macro, §58. 
right: trie_node *, §54. 



serial: int, §58. 
sym: sym.node *, §54. 
sym.node = struct, §58. 
trie.node = struct, §54. 
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75. A global variable called sym_buf holds all characters on middle branches to the 
current trie node; symjptr is the first currently unused character in sym_buf . 

{ Visit t and traverse t-*mid 75 ) = 

{ 

if (m&*80) mmoJ)yte{t->ch ^ S)\ 
mmo_byte{t-*ch & *ff); 

*sym.ptr++ = (m & *80 ? ’?’ : t-*ch)\ /* Unicode? not yet */ 

if {m A t-*symMink) { 

if (listing.file) (Print symbol symj)uf and its equivalent 78 ); 
if (m = 15) m = 1; 
else if (m > 8) m — = 8; 
for ( ; m > 0; m — ) 

if (m > 4) mmoJ)yte{(t-*sym^equiv .h (8 * (m — 5))) & *ff); 
else mmoJ)yte[(t-‘syrn~>equw .1 S> (8 * (m — 1))) & *ff); 
for (m = 0; m < 4; m++) 

if {t-'sym-serial < (1 ^ (7 * (m + 1)))) break; 
for ( ; m > 0; m — ) mmo-byte{{{t->sym->serial ^ (7 * m)) & *7f ) + (m ? 0 : *80)); 

} 

if (t-^mid) ouEstab{t->mid)\ 
symjptr — ; 

} 

This code is used in section 74. 

76. (Encode the length of t-'sym-equiv 76) = 

{ register tetra x; 

if {(t-^sym-equiv .h & *ffff0000) = *20000000) 

m += 8, X = t-sym-'equiv .h — *20000000; /* data segment */ 

else X = t-sym-equiv .h', 
if (x) m += 4; else x = t-*sym-^equiv .1; 
for (j = 1; j < 4; j++) 

if (a: < (1 <C (8*j))) break; 

m += j; 

} 

This code is used in section 74. 

77. We make room for symbols up to 999 bytes long. Strictly speaking, the program 
should check if this limit is exceeded; but really! 

( Global variables 27 ) += 

Char sym.buf [1000]-, 

Char * symjptr-, 
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78. The initial ‘ ’ of each fully qualified symbol is omitted here, since most users of 
MMIXAL will probably not need the PREFIX feature. One consequence of this omission 
is that the one-character symbol ‘ : ’ itself, which is allowed by the rules of MMIXAL, is 
printed as the null string. 

(Print symbol symj)uf and its equivalent 78} = 

{ 

*symjptr = ’ \0 ’ ; 

fpnntf {listing. file, "u°/oSu=u" , sym.buf -\- 1); 
pp = t-sym-, 

if {ppMink = DEFINED) fprintf {listing.file, "#7,08x7,08x" , pp^equiv .h, pp-’equiv .1); 
else if (pp-lmfc = REGISTER) fprintf {listing.file,"$7,03d" , pp-'equiv .1); 
else fprintf {listing. file, "?")■, 
fprintf {listing.file , "u(’/.d)\n" , pp^serial); 

} 

This code is used in section 75. 

79. (Report an undefined symbol 79} = 

{ 

*sym.ptr = (m & *80 ? ’?’ : t-’ch); /* Unicode? not yet */ 

*{sym.ptr -|- 1) = ’ \0 ’ ; 

fpnntf {stderr , "undef inedusymbol : u°/.s\n" , sym_6it/ -|- 1); 
err.count++; 
m += 2; 

} 

This code is used in section 74. 

80. ( Check and output the trie 80 ) = 
op.root^mid = A; /* annihilate all the opcodes */ 
prune ( trie.root ) ; 

sym.ptr = sym.buf; 

if {listing.file) fprintf (listing.file, "\nSymholut3hle:\iL" ); 

mmo.lop (lop. stab , 0, 0); 

out.stab (trie.root)', 

while (mmo.ptr SzS) mmo.byte{0)-, 

mmo.lopp (lop.end , mmo.ptr ';$> 2); 

This code is used in section 142. 



ch: unsigned short, §54. 
Char = char, §30. 

DEFINED = macro, §58. 
equiv: octa, §58. 
err.count: int, §46. 
fprintf: int (), <stdio.h>. 
h: tetra, §26. 
j: register int, §74. 

/: tetra, §26. 

link: sym.node *, §58. 

listing.file: FILE +, §139. 



lop.end = *c, §24. 
lop.stab = ^h, §24. 
m: register int, §74. 
mid: trie_node *, §54. 
mmo.byte: void (), §48. 
mmo.lop: void (), §48. 
mmo.lopp: void (), §48. 
mmo.ptr: int, §47. 
op.root: trie.node *, §56. 
out.stab: void, §74. 



pp: register sym.node *, 

§74. 

prune: trie.node *(), §73. 
REGISTER = macro, §58. 
serial: int, §58. 
stderr: FILE *, <stdio.h>. 
sym: sym.node *, §54. 
t: trie.node *, §74. 
tetra = unsigned int, §26. 
trie.root: trie.node *, §56. 
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81. Expressions. The most intricate part of the assembly process is the task 
of scanning and evaluating expressions in the operand field. Fortunately, MMIXAL’s 
expressions have a simple structure that can be handled easily with a stack-based 
approach. 

Two stacks hold pending data as the operand field is scanned and evaluated. The 
op_stack contains operators that have not yet been performed; the vaLstack contains 
values that have not yet been used. After an entire operand list has been scanned, 
the op-stack will be empty and the vaLstack will hold the operand values needed to 
assemble the current instruction. 

82. Entries on op^stack have one of the constant values defined here, and they have 
one of the precedence levels defined here. 

Entries on vaLstack have equiv, link, and status fields; the link points to a trie 
node if the expression is a symbol that has not yet been subjected to any operations. 
( Type definitions 26 ) -|-= 

typedef enum { 

negate, serialize, complement, registerize , inner Jp, 

plus , minus , times , over , frac , mod , shl , shr , and, or , xor , 
outer Jp, outer _rp, inner^rp 

} stack.op; 
typedef enum { 

zero , weak , strong , unary 

} prec; 

typedef enum { 

pure , reg.val , undefined 
} stat; 

typedef struct { 

octa equiv, /* current value */ 

trie_node *link\ /* trie reference for symbol */ 

stat status; /* pure, reg.val, or undefined */ 

} vaLnode; 

83. :j5^define top. op op.stack[opjptr — 1] /* top entry on the operator stack *■/ 

^define top.val vaLstack[vaLptr — 1] /* top entry on the value stack */ 

^define next.val vaLstack [vaLptr — 2] /* next-to-top entry of the value stack */ 

( Global variables 27 ) -l-= 

stack.op *op_stacA:; /* stack for pending operators */ 
int opjptr; /* number of items on op.stack */ 
val.node * vaLstack; /* stack for pending operands */ 
int vaLptr; /* number of items on vaLstack */ 
prec precedence [ ] = {unary, unary, unary, unary, zero, 

weak , weak , strong , strong , strong , strong , strong , strong , strong , weak , weak , 
zero , zero , zero}; /* precedences of the respective stack.op values */ 

stack.op rt.op; /* newly scanned operator */ 
octa acc; /* temporary accumulator */ 
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84. (Initialize everything 29 ) += 

op.stack = (stack_op *) calloc{bufsize, sizeof {stack^op))-, 
vaLstack = (vaLnode *) ca//oc( 6 n/_si 2 ;e, sizeof (vaLnode)); 
if {-^op.stack V -•vaLstack) pamc("Nouroomuforutheustacks"); 

85. The operand field of an instruction will have been copied into a separate Char 
array called operand-list when we reach this part of the program. 

( Scan the operand field 85 ) = 
p = operand-list ; 

val-ptr = 0; /* vaLstack is empty */ 

op-stack[0] = outer-lp, op-ptr = 1; 

/* op -Stack contains an “outer left parenthesis” */ 
while (1) { 

( Scan opening tokens until putting something on vaLstack 86 ); 
scan-close: (Scan a binary operator or closing token, rLop 97 ); 
while (precedence [top.op] > precedence[rLop]) 

( Perform the top operation on opstack 98 ) ; 
hold-op : op-stack [op-ptr ++] = rLop ; 



} 

Operands. done’. 

This code is used in section 102. 



buf.size: int, §139. 

calloc: void *(), <stdlib.h>. 

octa = struct, §26. 



operand.list: Char *, §33. panic = macro (), §45. 

p] register Char *, §40. trie.node = struct, §54. 
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86. A comment that follows an empty operand list needs to be detected here. 

( Scan opening tokens until putting something on vaLstack 86 ) = 

scan.open: if {isletter (*p)) ( Scan a symbol 87 ) 
else if (isdigit{*p)) { 

if {*{p + 1) = ’F’ ) ( Scan a forward local 88 ) 
else if {*{p+ 1) = ’B’) (Scan a backward local 89) 
else (Scan a decimal constant 94); 

} else switch (*p++) { 

case ( Scan a hexadecimal constant 95); break; 

case ( Scan a character constant 92 ); break; 

case ( Scan a string constant 93 ); break; 

case ( Scan the current location 96 ); break; 

case op.stack [op.ptr -H-] = negate-, 

case ’ + goto scan.open-, 

case opstack[op-ptr-H-] - serialize-, goto scan.open-, 

case opstack[op_ptr-H-] — complement-, goto scan-open; 

case opstack[op-ptr-H-] — registerize-, goto scan.open-, 

case op_stack[op.ptr-H-] = inner Jp-, goto scamopen-, 

default : 

if (p = operandjist + 1) { /* treat operand list as empty */ 

operand J,ist[Q\ = ’0’ , operandjist [1] = ’\0’ ,p = operandjist; 
goto scan.open; 

} 

if (*(p—l)) derr ("syntaxuerroruatucharacteru‘7.c ’ " , *(p — 1)); 
derr ("syntaxuerroruafterucharacteru‘°/oc’ " , *(p — 2)); 

} 

This code is used in section 85. 

87. (Scan a symbol S7) = 

{ 

if (*p = ’ : tt = trie.search[triejroot ,p + 1); 

else tt = trie.search{cur.prefix ,p); 
p = terminator; 
symboLfound-. vaLptr-H-; 
pp = tt^sym; 

if i^pp) pp = tt-sym = new -symjnode {true); 
top.val . link = tt , top.val . equiv = pp^equiv ; 
if {pp^link = PREDEFINED) pp^link — DEFINED; 

top-val .status = {pp-'link = DEFINED ? pure : ppNink = REGISTER ? reg.val : undefined); 

} 

This code is used in section 86. 
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88. ( Scan a forward local 88 ) = 

{ 

tt = SzforwardJocaLhost[*p — ’0’]; p += 2; goto symboLfound ■, 

} 

This code is used in section 86. 

89. ( Scan a backward local 89 ) = 

{ 

tt = &ibackwardJocaLhost[*p — ’0’]; p += 2; goto symbol-found-, 

} 

This code is used in section 86. 

90. Statically allocated variables forward-locaLhost[j] and backwardJocaLhost[j] 
masquerade as nodes of the trie. 

( Global variables 27 ) += 

trie_node forward-locaLhost [10], backward-locaLhost [10]; 
sym_node forward-local[10], backward-local [10]; 

91. Initially OH, IH, . . . , 9H are defined to be zero. 

( Initialize everything 29 } += 

for (j = 0; j < 10; j++) { 

forward-local-host[j].sym = &cforward-local[j]; 
backward-local-host]j].sym = &ibackward-local[j]; 
backward-local[j].lmk — DEFINED; 

} 



complement =2, §82. 
cur.prefix-. trie_node *, §56. 
DEFINED = macro, §58. 
derr = macro ( ), §45. 
equiv: octa, §82. 
equiv: octa, §58. 
inner Jp =4, §82. 
isdigit: int (), <ctype.h>. 
isletter = macro (), §57. 
j] register int, §136. 
link: trie.node *, §82. 
link: sym.node *, §58. 
negate = 0, §82. 
new^sym^node: sym.node 



*0, §59. 
op.ptr: int, §83. 
op^stack: stack.op *, §83. 
operandJist: Char *, §33. 
p\ register Char *, §40. 
pp: register sym.node *, 
§65. 

PREDEFINED = macro, §58. 
pure = 0, §82. 
regjual = 1, §82. 

REGISTER = macro, §58. 
registerize =3, §82. 
serialize = 1, §82. 
status: stat, §82. 



sym: sym.node *, §54. 
sym.node = struct, §58. 
terminator: Char *, §57. 
top.val = macro, §83. 
trie.node = struct, §54. 
triejroot: trie.node *, §56. 
trie^search: trie.node *(), 
§57. 

true = 1, §26. 

tt: register trie.node *, §65. 

undefined =2, §82. 
vaLptr: int, §83. 
vaLstack: vaLnode *, §83. 
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92. We have already checked to make sure that the character constant is legal. 
( Scan a character constant 92 ) = 

acc.h = 0, acc.l = (unsigned char) *p; 

P += 2; 

goto constanEfound-, 

This code is used in section 86. 

93. ( Scan a string constant 93 ) = 
acc.h = 0, acc.l = (unsigned char) *p; 
if (*p = ’ \ " ’ ) { 

P++; 
acc.l — 0; 

err ( " *nulluStringuisutreateduasuzero " ) ; 

} else if (*{p + 1) = ’ \" ’ ) p +~ 2; 
else *p = — p = ’ , ’ ; 

goto constant.found; 

This code is used in section 86. 

94. ( Scan a decimal constant 94 } = 
acc.h = 0, acc.l = *p— ’0’ ; 

for (p++; isdigit(*p)\ p++) { 

acc = oplus{acc, shiftjeft{acc,2)); 
acc = incr {shift Jeft {acc, 1), *p — ’O’); 

} 

constant.found : vaLptr ++ ; 
top jual .link = A; 
top.val .equiv = acc, 
top jual. status = pure. 

This code is used in section 86. 

95. (Scan a hexadecimal constant 95) = 

if {-iisxdigit{*p)) err ("illegaluhexadecimaluconstant" ); 

acc.h = acc.l = 0; 

for ( ; isxdigit{*p)\ p++) { 

acc = incr {shift Jeft {acc, 4:), *p — ’O’); 

if {*p > ’a’ ) acc = incr{acc, ’0’ — ’a’ + 10); 

else if (*p > ’ A ’ ) acc = incr ( acc ,’0’ — ’A’ +10); 

} 

goto constant.found; 

This code is used in section 86. 

96. ( Scan the current location 96 ) = 
acc = cur Joe; 

goto constanEfound; 

This code is used in section 86. 

97. ( Scan a binary operator or closing token, rt.op 97) = 
switch (*p++) { 

case ’ + ’: rt.op = plus; break; 
case rt^op = minus; break; 
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case ri_op = times’, break; 

case if (*p 7 ^ ’/’) rt.op = over; 

else p++,rt.op = frac; break; 
case rt.op = mod ; break ; 

case ’ < ’ : ri_op = shl ; goto sh^check ; 
case ’ > ’ : ri_op = shr ; 

sh.check’. p++; if (*(p — 1) = *(p — 2)) break; 

derr ("syntaxuerroruatu'/.c ’ " , *{p — 2)); 
case rt.op = and; break; 

case ’ I ’ : rt.op = or; break; 

case ’ “ ’ : ri_op = xor ; break ; 

case ’)’ ■ rEop = inner. jp ; break ; 
case ’\0’: case rt.op = outer. rp; break; 

default: derr ("syntaxuerroruatu‘’/.c ’ " , *(p — 1)); 

} 

This code is used in section 85. 

98. (Perform the top operation on op.stack 98) = 
switch {op.stack[ — op.ptr]) { 

case inner.lp’. if {rt.op = inner.rp) goto scan.close; 

err ("*missingurightuparenthesis" ); break; 
case outer.lp: if {rt.op = outer.rp) { 

if {top.val .status = reg.val A {top.val .equiv .1 > *ff V top.val .equiv .h)) { 
err ( " ^registerunumberutooularge , uwillubeureduceduinodu256" ) ; 
top.val .equiv .h = 0, top.val .equiv . I &= *ff ; 

} 

if (-'*(p— 1)) goto operands. done; 

else rt.op — outer.lp; goto hold.op; /* comma */ 

} else { 

op.ptr ++; err ("*missinguleftuparenthesis" ); 
goto scan.close; 

} 

{ Cases for unary operators lOO } 

( Cases for binary operators 99 ) 

} 

This code is used in section 85. 



acc: octa, §83. 

and = 13, §82. 

curJoc: octa, §43. 

derr = macro ( ), §45. 

equiv: octa, §82. 

err = macro ( ), §45. 

frac = 9, §82. 

h: tetra, §26. 

hold^op: label, §85. 

incr: octa (), mmix-ARITH §6. 

inner Jp =4, §82. 

innerjrp = 18, §82. 

isdigit: int (), <ctype.h>. 

isxdigit: int (), <ctype.h>. 



1: tetra, §26. 
link: trie.node *, §82. 
minus = 6, §82. 
mod = 10, §82. 
op.ptr: int, §83. 



or = 14, §82. 
outer.lp = 16, §82. 
outer.rp = 17, §82. 
over = 8, §82. 
p: register Char *, §40. 
plus = 5, §82. 



pure = 0, §82. 
reg.val = 1, §82. 
rt.op: stack.op, §83. 
scan.close: label, §85. 
shift.left: octa (), 
MMIX-ARITH §7. 
shl = 11, §82. 
shr = 12, §82. 
status: stat, §82. 
times = 7, §82. 
top.val = macro, §83. 
val.ptr: int, §83. 
xor = 15, §82. 



op.stack: stack.op *, §83. 
operands.done: label, §85. 

Oplus: octa (), MMIX-ARITH §5. 
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99. Now we come to the part where equivalents are changed by unary or binary 
operators found in the expression being scanned. 

The most typical operator, and in some ways the fussiest one to deal with, is binary 
addition. Once we’ve written the code for this case, the other cases almost take care 
of themselves. 

( Cases for binary operators 99 ) = 

case plus: if (top.val .status = undefined) err ("cannotuadduanuundef ineduquantity" ); 
if {nexEval. status = undefined) err ("cannotuaddutouanuundef ineduquantity" ); 
if {topjual .status = reg.val A nexEval .status = regjual) 
err ( " cannotuaddutwour egi st erunumber s " ) ; 
next.val.eguiv = oplus{nextjual.equiv ,top.val.equw)\ 

finj)in: nexEval .status = {top.val .status = next jual .status ? pure : reg.val); vaLpti ; 

delink: top.val .link = A; break; 

See also section 101 . 

This code is used in section 98 . 

100 . :j5^define unary. eheck ( verb ) 

if {top.val .status ^ pure) derr ("canu°/oSupureuvaluesuonly" , rer6) 

( Cases for unary operators lOO ) = 
case negate: unary. check ("negate" )-, 

top.val .equiv = ominus (zero. octa, top. val.equiv); goto delink-, 
case complement: unary.check(" complement"); 

top.val .equiv .h = ^top.val .equiv .h, top.val .equiv . I = '^top.val .equiv . 1 ; 
goto delink; 

case registerize: unary.check ("reglsterize" ); 
top.val .status = reg.val; goto delink; 

case serialize: if (-^top.val .link) err ("canutakeuserialunumberuofusymboluonly" ); 
top.val .equiv .h = 0, top.val .equiv . I = top.val .link^sym-^serial; 
top.val .status = pure; goto delink; 

This code is used in section 98 . 

101. ^define binary. check (verb) if (top.val .status pure W next.val .status pure) 

derr("canu’/.Supureuvaluesuonly" , verb) 

{ Cases for binary operators 99 ) += 
case minus: if (top.val .status = undefined) 

err ( " cannotusubtr act uanuundef ineduquant ity " ) ; 
if (next.val .status = undefined) 

err ( " cannotusubtractuf romuanuundef ineduquant ity" ) ; 
if (top.val .status = reg.val A next.val. status 7^ reg.val) 

err ("cannotusubtracturegisterunumberufromupureuvalue" ); 
next.val. equiv = ominus(next.val. equiv , top.val. equiv); goto fin.bin; 
case times: 6man/_c/iecfc ("multiply" ); 

next.val. equiv = omult (next.val. equiv , top.val. equiv); goto fin.bin; 
case over: case mod: binary.check(" divide"); 

if (top.val .equiv . I = 0 A top.val .equiv .h = 0 ) err ("*divisionubyuzero" ); 
next.val . equiv = odiv (zero.octa , next.val . equiv , top.val . equiv ) ; 
if (op.stack[op.ptr] = mod) next.val. equiv = aux; 
goto fin.bin; 
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case frac: 6mary_c/iecfc("computeuauratiouof " ); 

if {next.val .equiv .h > top.val.equiv .h A {next.val .equiv .1 > 

top.val.equiv.lV next.val.equiv.h > top.val.equiv.h)) err ("*illegalufraction" ); 
next.val.equiv = odiv{next.val.equiv, zero.octa^top.val.equiv)] goto fin.bin; 

case shl: case shr: ftznary.c/iec/c ("computeuaubitwiseushif tyof " ); 
if {top.val.equiv.hV top.val.equiv .1 > 63) next.val .equiv = zero.octa; 
else if {op.stack[op.ptr] = shl) 

next.val.equiv = shiftJeft{next.val .equiv , top.val .equiv .1)', 
else next.val .equiv = shiftjright{next.val.equiv ^top.val.equiv 
goto fin.birr, 

case and: 6mary_c/iecA: ("computeubitwiseuanduof " ); 

next.val .equiv .h &= top.val.equiv .h^next.val.equiv .1 Sz= top.val .equiv .1; 
goto fin.bin] 

case or: 6mary_c/iecA: ("computeubitwiseuoruof " ); 

next.val.equiv .h \= top.val.equiv .h^ next.val.equiv .1 |= top.val .equiv .1] 
goto fin.bin-, 

case xor: 6mary_c/iecfc ("computeubitwiseuxoruof " ); 

next.val.equiv .h 0= top.val.equiv .h^next.val.equiv .1 0= top.val .equiv . 1 ; 
goto fin.bin] 



and = 13, §82. 
aux: octa, MMIX-ARITH §4. 
complement =2, §82. 
derr = macro ( ), §45. 
equiv: octa, §82. 
err = macro ( ), §45. 
frac = 9, §82. 
h: tetra, §26. 

1: tetra, §26. 

link: trie.node *, §82. 

minus = 6, §82. 

mod = 10, §82. 

negate = 0, §82. 

nea^i.ra/ = macro, §83. 

odiv: octa {), MMIX-ARITH §13. 



ominus: octa (), 

MMIX-ARITH §5. 
omult: octa (), 

MMIX-ARITH §8. 
op^ptr: int, §83. 
op.stack: stack.op *, §83. 
Oplus: octa (), MMIX-ARITH §5. 
or = 14, §82. 
over = 8, §82. 
plus = 5, §82. 
pure = 0, §82. 
reg^val = 1, §82. 
registerize =3, §82. 
serial: int, §58. 
serialize = 1, §82. 



shiftJeft: octa (), 
MMIX-ARITH §7. 
shiftjright: octa (), 
MMIX-ARITH §7. 
shl = 11, §82. 
shr = 12, §82. 
status: stat, §82. 
sym: sym.node *, §54. 
times = 7, §82. 
top.val = macro, §83. 
undefined =2, §82. 
vaLptr: int, §83. 
xor = 15, §82. 
zero.octa: octa, 
MMIX-ARITH §4. 
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102. Assembling an instruction. Now let’s move up from the expression level 
to the instruction level. We get to this part of the program at the beginning of a 
line, or after a semicolon at the end of an instruction earlier on the current line. Our 
current position in the buffer is the value of buf-ptr. 

{ Process the next MMIXAL instruction or comment 102 ) = 
p — buf.ptr ; buf.ptr — " " ; 

( Scan the label field; goto bypass if there is none 103 } ; 

(Scan the opcode field; goto bypass if there is none 104); 

( Copy the operand field 106 ) ; 
buf.ptr = p; 

if {spec.mode A ^{op.bits & spec-bit)) 

derr ( " cannot uuseu ' ’/.s ’ uinuspecialuinode" , op-field)’, 
if {{op.bits k noJabeLbit) A lab-field[0]) { 

derr("*labeluf ielduofu'y.s ’uinstructionuisuignored" , op.field)', 
lab.field [0] = ’ \0 ’ ; 

} 

if (op^bits align^bits) ( Align the location pointer 107); 

( Scan the operand field 85 ) ; 

if (opcode = GREG) (Allocate a global register 108 ); 
if (lab_field[0]) ( Define the label 109 ); 

(Do the operation 116 ); 
bypass : 

This code is used in section 136. 

103. (Scan the label field; goto bypass if there is none 103 ) = 
if (-^*p) goto bypass-, 

q — lab.field-, 
if (-iisspace(*p)) { 

if (->isdigit(*p) A->isletter(*p)) goto bypass; /* comment */ 
for (*q++ = *p++; isdigit(*p) V isletter (*p); p++,q++) *q — *p; 
if (*p A ^isspace(*p)) derr ("labelusyntaxuerroruatu‘’/.c ’ " , *p); 

} 

*q = ’\0’ ; 

if (isdigit(lab_field[0]) A (lab_field[l] yf ’H’ V lab.field[2])) 
derr ("improperulocalulabelu‘7oS ’ " , lab.field); 
for (p++; isspace(*p); p++) ; 

This code is used in section 102. 

104. We copy the opcode field to a special buffer because we might want to refer 
to the symbolic opcode in error messages. 

( Scan the opcode field; goto bypass if there is none 104 ) = 

q = op.field; while (isletter (*p) V isdigit(*p)) *g++ = *p++; *q — ’\0’ ; 
if (-^isspace{*p) A *p A op_field[0]) derr ("opcodeusyntaxuerroruatu‘°/oc’ ", *p); 
pp = trie^search(opjroot , op^field)->sym; 
if (-.pp) { 

if (op_field[0]) derr("unknownuoperationuCodeu‘’/.s’ " , op_/ie/d); 
if (lab.field[0]) derr ("*nouopcode ;ulabelu‘ °/oS ’uwillubeuignored" , lab.field); 
goto bypass; 

} 
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opcode = pp-equiv .h, opJ)its = pp^equiv .l\ 
while (isspace{*p)) p++; 

This code is used in section 102. 

105. (Global variables 27) += 

tetra opcode-, /* numeric code for MMIX operation or MMIXAL pseudo-op */ 
tetra op.bits-, /* flags describing an operator’s special characteristics */ 

106. We copy the operand field to a special buffer so that we can change string 
constants while scanning them later. 

( Copy the operand field 106 ) = 
q = operand Jist; 
while (*p) { 

if (*p = break; 

if (*p = ’ \ ” ) { 

*q++ = *p++-, 

if {—<*p) err("incompleteucharacteruConstant" ); 

* 5 - 1 -+ = *p++-, 

if (*p yf err("illegalucharacteruConstant" ); 

} else if {*p = ’V") { 

for (*g+-l- = *p++-, *p A *p ^ \ " ; p++,q++) *q = *p; 
if {-^*p) err("incompleteuStringuConstant" ); 

} 

*q++ = *p-l-+; 
if (isspace{*p)) break; 

} 

while {isspace {^p)) p++; 
if (*p = ^ ; * ) P++; 

else p = /* if not followed by semicolon, rest of the line is a comment */ 

if (g = operandJist) *g++ = ^ 0 ^ ; /* change empty operand field to ‘ 0 ’ */ 

*g = '\ 0 ' ; 

This code is used in section 102. 



align.bits = "^30000, §62. 
buf^ptr: Char §33. 
derr = macro ( ), §45. 
equiv: octa, §58. 
err = macro ( ), §45. 

GREG = ^106, §62. 
h: tetra, §26. 
isdigit: int (), <ctype.h>. 
isletter = macro (), §57. 



isspace: int (), <ctype.h>. 
1: tetra, §26. 
lab^field: Char +, §33. 
no JabeLbit = "^4:0000, §62. 
op^field: Char *, §33. 
op^root: trie.node *, §56. 
operandJist: Char *, §33. 
p: register Char *, §40. 
pp: register sym.node *, 



§65. 

g: register Char *, §40. 
specMt = *100000, §62. 
spec.mode: bool, §43. 
sym: sym.node *, §54. 
tetra = unsigned int, §26. 
trie^search: trie.node *(), 
§57. 
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107. It is important to do the alignment in this step before defining the label or 
evaluating the operand field. 

(Align the location pointer ro7) = 

{ 

j = [opJ)its & alignj)its) ^ 16; 

acc.h = — 1 , acc.l = —(1 41 j); 

cur doc = oand{incr{curJoc, (1 41 j) — 1), acc); 

} 

This code is used in section 102. 

108. (Allocate a global register los) = 

{ 

if {vaLstack[0].equiv .1 V vaLstack[0].equiv .h) { 
for {j = greg\ j < 255; j++) 

if [greg.val[j].l = vaLstack[Q\.equw .1 A greg^val[j\.h = vaLstack^].equiv .h) { 
cur.greg = j ; 
goto got.greg-, 

} 

} 

if {greg = 32) err("tooijnianyuglobaluregisters" ); 
greg — ; 

greg_val[greg] = vaLstack[0].equiv; cur.greg = greg; 
got-greg: ; 

} 

This code is used in section 102. 

109. If the label is, say 2H, we will already have used the old value of 2B when 
evaluating the operands. Furthermore, an operand of 2F will have been treated as 
undefined, which it still is. 

Symbols can be defined more than once, but only if each definition gives them the 
same equivalent value. 

A warning message is given when a predefined symbol is being redefined, if its 
predefined value has already been used. 

( Define the label ro9 ) = 

{ 

sym_node *newdink = DEFINED; 

acc = cur doc; 
if {opcode = IS) { 

if {vaLstack[0].status = undefined) err("theuoperanduisuundef ined" ); 
cur doc = vaLstack^].equiv; 

if [vaLstack^]. status = reg.val) newdink = REGISTER; 

} else if {opcode = GREG) curdoe.h = 0, curdoe.l = cur.greg, newdink = REGISTER; 

( Find the symbol table node, pp ill ); 
if {pp^link = DEFINED V pp^link = REGISTER) { 

if {pp-equiv .1 7 ^ curdoe.l V pp^equiv .h curdoc.hW pp-link 7 ^ newdink) { 
if {pp-’serial) derr ("symbolu‘°/.s ’uisualreadyudef ined" , lab.field); 
pp-^senal = ++ seriaLnumber; 

derT("*redefinitionuofupredefinedusymboIu‘"/.s’ ", lab.field); 
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} 

} else if {pp-link = PREDEFINED) pp->serial = ++ serial jnumber-, 
else if (pp-’link) { 

if {newJink = REGISTER) err ("futureuref erenceucarmotubeutouauregister" ); 
do (Fix prior references to this label 112 ) while {ppMink)\ 

} 

if {isdigit(lab.field^])) pp — ^backward JjOcal\lah.field^] — ’0’]; 
pp^equiv = cur Joe \ ppMink = new Jink-, 

{ Fix references that might be in the vaLstack 110 ); 
if {listing.file A {opcode = IS V opcode = LOG)) 

(Make special listing to show the label equivalent 115 ); 
cur Joe = acc-, 

} 

This code is used in section 102. 

110 . (Fix references that might be in the vaLstack 110 ) = 
if {-^isdigit{lab.field\}S\)) 

for {j = 0; j < vaLptr; j++) 

if [vaLstack[j]. status = undefined A vaLstack\j].link-*sym = pp) { 
vaLstack\j]. status = {newJink = REGISTER ? regjual : pure)-, 
vaLstack \j].equiv = cur Joe-, 

} 

This code is used in section 109. 

111 . (Find the symbol table node, pp ill) = 

if (isdigit{lab-field[0])) pp = &iforwardjocal[lab.field[0] — ’0’]; 
else { 

if {lab.field[0] = ’ tt = triesearch{trie^root , lab. field + 1); 
else tt = trie.search{cur. prefix ,lab.field)-, 
pp = tt-sym-, 

if (-'Pp) pp = tt^sym = new.sym.node {true)-, 

} 

This code is used in section 109. 



acc: octa, §83. 
align.bits = "^30000, §62. 
backward.local: sym.node [], 
§90. 

cur.greg: int, §143. 
curJoc: octa, §43. 
curjprefix: trie_node *, §56. 
DEFINED = macro, §58. 
derr = macro ( ), §45. 
equiv: octa, §82. 
equiv: octa, §58. 
err = macro ( ), §45. 
forward.local: sym.node [], 
§90. 

GREG = ^106, §62. 
greg: int, §143. 
qreqjual: octa [I, §133. 
h: tetra, §26. 



incr: octa (), mmix-ARITH §6. 
IS = #101, §62. 
isdigit: int {), <ctype.h>. 
j: register int, §136. 

1: tetra, §26. 
lab.field: Char +, §33. 
link: sym.node *, §58. 
link: trie.node *, §82. 
listing^file: FILE =t=, §139. 

LOG = ^102, §62. 
new^sym^node: sym.node 
*0, §59. 
oand: octa (), 

MMIX-ARITH §25. 
opJjits: tetra, §105. 
opcode: tetra, §105. 
pp: register sym.node *, 
§65. 



PREDEFINED = macro, §58. 
pure = 0, §82. 
reg.val = 1, §82. 

REGISTER = macro, §58. 
serial: int, §58. 
seriaLnumber: int, §60. 
status: stat, §82. 
sym: sym.node *, §54. 
sym.node = struct, §58. 
triejroot: trie.node *, §56. 
trie^search: trie.node *(), 
§57. 

true = 1, §26. 

tt: register trie.node *, §65. 

undefined =2, §82. 
vaLptr: int, §83. 
vaLstack: vaLnode *, §83. 
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112 . (Fix prior references to this label 112 ) = 

{ 

qq = pp-'link-, 
pp-link = qq^link\ 
mmoJoc { ); 

if {qq -serial = fix^o) (Fix a future reference from an octabyte 113 ) 
else (Fix a future reference from a relative address 114); 
recycle.fixup {qq)\ 

} 

This code is used in section 109. 

113 . (Fix a future reference from an octabyte 113 ) = 

{ 

if {qq-equiv .h { 

mmoJop{lop-fixo , 0, 2); 
mmoAetra ( qq-equiv .h); 

} else mmoJop{lop^fixo , qq-equiv .h 24, 1); 
mmo.tetra ( qq-equiv .1) ; 

} 

This code is used in section 112. 

114 . (Fix a future reference from a relative address ill) = 

{ 

octa o; 

o = ominus ( cur Joe , qq-equiv ) ; 
if {o.l & 3) 

dderr ("*relativeuaddressuinulocationu#°/.08x°/,08xunotudivisibleubyu4 
qq-equiv .h, qq-equiv ,l)\ 
o = shiftjright{o, 2, 0); k = 0; 
if {o.h = 0) 

if (o.l < *10000) mmoJopp{lop.fixr , o.l)-, 
else if (qq-serial = fix-xyz A o.l < *1000000) { 
mmoJop{lop-fixrx , 0, 24); mmoJ,etra{o.l)-, 

} else k = 1; 
else if (o.h = ) 

if (qq-serial = fix.xyz A o.l > *ff000000) { 

mmoJop(lop-fixrx , 0, 24); mmoJ,etra(o.l & *lffffff); 

} else if (qq-serial = fix-yz A o.l > *ffff0000) { 

mmoJop(lop-fixrx , 0, 16); mmoJ,etra(o.l & *100ffff); 

} else fc = 1; 
else fc = 1; 

if (k) dderT("relativeuaddressuinulocationu#y.08x°/,08xuisutoouf aruaway 
qq-equiv .h, qq-equiv .1)-, 

} 

This code is used in section 112. 
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115 . (Make special listing to show the label equivalent 115 ) = 
if {newJink = DEFINED) { 

fprintf (listing. file, " (°/,08x°/,08x) " , curJoc.h, curJoc.l); 
flush JistingJine ( " u " ) i 
} else { 

fprintf (listing. file, " ($’/,03d) " , curJoc.l & f ); 

flush.listing.line ( " uuuuuuuuuuuuu " ) j 

} 

This code is used in section 109. 

116 . (Do the operation lie) = 
future.bits = 0; 

if (op.bits & many. arg. hit) (Do a many-operand operation 117 ) 
else switch (val.ptr) { 

case 1: if (^(op.bits & one.arg.bit)) 

derr ("opcodeu‘°/oS ’uneedsuinoreuthanuoneuoperand" , op. field)', 

(Do a one-operand operation 129 ); 
case 2: if (-^(op.bits & two. arg. hit)) 
if (op.bits & one.arg.bit) 

derr("opcodeu‘’/.s’uinustunotuhaveutwouoperands" , op.field) 
else derr("opcodeu‘’/.s’uinustuhaveunioreuthanutwouoperands" , op.field)', 
if ((op.bits & (three.arg.bit + mem.bit)) = three.arg.bit) goto make.two. three', 
(Do a two-operand operation 124 ); 
make.two.three: val.stack[2] = val.stack[l], val.ptr = 3; 

val.stack[l].equiv = zero.octa, val.stack[l].link = A, val.stack\l], status = pure; 
/* insert 0 as the second operand */ 
case 3: if (^(op.bits & three.arg.bit)) 

derr ("opcodeu‘°/oS ’uinustunotuhaveuthreeuoperands" , op.field); 

(Do a three-operand operation 119 ); 
default: derr("toouinanyuoperandsuf oruopcodeu‘’/.s’ " , op.field); 

} 

This code is used in section 102. 



curJoc: octa, §43. 
dderr= macro (), §45. 

DEFINED = macro, §58. 
derr = macro ( ), §45. 
equiv: octa, §58. 
equiv: octa, §82. 
fix^o = 0, §58. 
fix^xyz = 2, §58. 
fix^yz = 1, §58. 

flush JistingJine: void (), §41. 
fprintf: int {), <stdio.h>. 
futureJ)its: int, §120. 
h: tetra, §26. 
k: register int, §136. 

1 : tetra, §26. 

link: sym.node *, §58. 

link: trie.node *, §82. 



listing.file: FILE *, §139. 
lop^fixo = §24. 

lop.fixr = ^^4, §24. 
lop^fixrx = §24. 

many. arg.bit = ^ 8000 , §62. 
mem.bit = ^80000, §62. 
mmo.loc: void (), §49. 
mmo.lop: void (), §48. 
mmo.lopp: void (), §48. 
mmo.tetra: void (), §48. 
newJink: sym_node *, §109. 
octa = struct, §26. 
ominus: octa (), 
MMIX-ARITH §5. 
one.arg.bit = ^ 1000 ^ §62. 
op.bits: tetra, §105. 
op.field: Char *, §33. 



pp: register sym.node *, 

§65. 

pure = 0, §82. 

qq: register sym.node *, 

§65. 

recy cle.fixup =ma.cYo (), §59. 
serial: int, §58. 
shift.right: octa (), 
MMIX-ARITH §7. 
status: stat, §82. 
three.arg.bit = "^4000, §62. 
two.arg.bit = ^2000, §62. 
val.ptr: int, §83. 
val.stack: val.node *, §83. 
zero.octa: octa, 
MMIX-ARITH §4. 
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117. The many-operand operators are BYTE, WYDE, TETRA, and OCTA. 

(Do a many-operand operation 117 ) = 

for (i = 0; j < vaLptr- j++) { 

(Deal with cases where vaLstack[j] is impure lis); 
fc = 1 ^ {opcode — BYTE); 
if {{vaLstack[j].equiv .h A opcode < DCTA) V 

{vaLstack[j].equiv .1 > *ffff A opcode < TETRA) V 
{vaLstack[j].equiv .1 > A opcode < WYDE)) 
if (fc = 1) err("*constantijcioesn’tuf ituinuoneubyte" ) 
else derr ("*constantudoesn’tuf ituinu°/odubytes" , k)\ 
if (fc < 8) assemble(k, vaLstack[j].equiv .1,0)', 

else if {vaLstack[j], status = undefined) assemfeZe (4, 0, *f0), assemb/e (4, 0, *f0); 
else assemble {4:, vaLstack \j].equiv -h,0), assemble (4, vaLstack [j].equiv .1, 0); 

} 

This code is used in section 116. 

118. (Deal with cases where vaLstack[j] is impure lls) = 

if {val^tack[j], status = reg-val) err ("*registerunumberuUseduasuauConstant" ) 
else if [vaLstack[j]. status = undefined) { 

if {opcode OCT k) err ("undef ineduconstant" ); 

pp = vaLstack[j].link^sym; 

qq = new-sym.node {false); 

qq^link = pp^link; 

ppMink = qq; 

qq^senal = fix.o ; 

qq-equiv = cur Joe; 

} 

This code is used in section 117. 

119. (Do a three-operand operation 119 ) = 

( Do the Z field 121 ) ; 

(Do the Y field 122 ); 

assemble.X'. (Do the X field 123 ); 

assemble Jnst: assemble {4, {opcode <C 24) -|- xyz , futurej>its); 

break; 

This code is used in section 116. 

120. Individual fields of an instruction are placed into global variables z, y, x, yz, 

and/or xyz. 

{ Global variables 27 ) -|-= 

tetra z, y, x, yz, xyz; /* pieces for assembly */ 

int futurej)its; /* places where there are future references */ 

121. ( Do the Z field 121 } = 

if {vaLstack[2].status = undefined) err ("Zuf ielduisuundef ined" ); 
if {vaLstack [2], status = regjual) { 

if {-i{opJ)its & {immedj)it -|- zrjyit -\- zarjyit))) 

derr ( " *Zuf ielduof □ ‘ 7«s ’ ushouldunot ubeuUur egist erunumber " , op.field ) ; 

} else if ( opjiits & immedjiit ) opcode -H- ; / * immediate * / 
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else if {opJ)its &i zrj)it) 

derr ( " *Zuf ielduof u ‘ ’/s ’ ushouldubeuauregist erunumber " , op. field ) ; 
if {vaLstack[2].equiv .h y vaLstack[2].equiv .1 > *±f) 
err ("*Zuf ieldudoesn’ tuf ituinuoneubyte" ); 
z = vaLstack[2].equiv .1 & ; 

This code is used in section 119. 

122. ( Do the Y field 122 ) = 

if {vaLstack[l].status = undefined) err("Yufielduisuundefined"); 
if [val.stack[l]. status = reg.val) { 
if {-^{op.bits & {yr.bit + yar.bit))) 

derr ( " *Yuf ielduof u ‘ °/oS ’ ushouldunotubeuauregist erunumber" , op.field)', 
} else if {op.bits & yr.bit) 

derr ( " *Yuf ielduof u ‘ ’/s ’ ushouldubeuauregist erunumber " , op. field ) ; 
if {val.stack[l].equiv .h y val.stack[l].equiv .1 > *±f) 
err ( " *Yuf ieldudoesn ’ tuf i tuinuoneubyte " ) ; 
y = val.stack\l\.equiv .1 & *ff ; j/z = (j/ <C 8) + 2 ; 

This code is used in section 119. 

123. (Do the X field 123 ) = 

if [val.stackif)]. status = undefined) err("Xufielduisuundef ined"); 
if {val.stack[0]. status = reg.val) { 
if {-^{op.bits & {xr.bit + xar.bit))) 

derr ( " *Xuf ielduof u ‘ °/oS ’ ushouldunotubeuauregist erunumber " , op. field ) ; 
} else if {op.bits & xr.bit) 

derr ( " *Xuf ielduof u ‘ ’/s ’ ushouldubeuauregist erunumber " , op. field ) ; 
if {val.stack[0].equiv .h y vaLstack[0].equiv .1 > ’^ff) 
err ( " *Xuf ieldudoesn ’ tuf i tuinuoneubyte " ) ; 

X = val.staek[0].equiv .1 & xyz = (a: <C 16) + yz-, 

This code is used in section 119. 



assemble: void {), §52. 
BYTE = #108, §62. 
cur.loc: octa, §43. 
derr = macro ( ), §45. 
equiv: octa, §82. 
equiv: octa, §58. 
err = macro ( ), §45. 
false = 0, §26. 
fix.o = 0, §58. 
h: tetra, §26. 
immed.bit = #2, §62. 
j: register int, §136. 
k: register int, §136. 

1: tetra, §26. 



link: trie_node *, §82. 
link: sym.node *, §58. 
new.sym.node: sym.node 
* 0 , §59. 

OCTA = 10b, §62. 

opJ)its\ tetra, §105. 
op^field: Char *, §33. 
opcode: tetra, §105. 
pp: register sym.node *, 
§65. 

qq: register sym.node *, 

§65. 

regjual = 1, §82. 
serial: int, §58. 



status: stat, §82. 
sym: sym.node *, §54. 
TETRA = 10a, §62. 

tetra = unsigned int, §26. 

undefined =2, §82. 
vaLptr: int, §83. 
vaLstack: val.node *, §83. 
WYDE = ^109, §62. 
xarj)it = "^^0, §62. 
xrj)it = ^80, §62. 
yarj)it = *1Q, §62. 
yr^bit = "^20, §62. 
zar.hit = "^4, §62. 
zr^bit = §62. 
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124. (Do a two-operand operation 124 ) = 
if {vaLstack[l].status = undefined) { 

if (op.bits & reLaddrJtit) 

{ Assemble YZ as a future reference and goto assemble.X 125 } 
else err ( " YZuf ielduisuundef ined" ) ; 

} else if {val.stack[l\. status = reg-val) { 

if {^{op.bits & {immedj)it -|- yzr.bit -|- yzar.bit))) 

derr ( " *YZuf ielduof u ‘ °/oS ’ ushouldunotubeuauregisterunumber " , op.field)', 
if {opeode = SET) vaLstack[l].equiv .1 <C= 8, opeode = *cl\ /* change to OR */ 
else if [opJ)its & memJ)it) vaLstack[l].equiv .1 <C= 8, opcode -H-; 

/* silently append ,0 */ 

} else { /* vaLstack[l].status = pure */ 

if {op.bits & memj)it) 

{ Assemble YZ as a memory address and goto assemble-X 127 ) ; 
if {opeode = SET) opcode = *e3; /* change to SETL */ 

else if {op.bits &L immed.bit) opcode ++■, /* immediate */ 

else if {opJ)its & yzrjiit) { 

derr (" *YZuf ielduofu‘°/oS ’ushouldubeuauregisterunumber" , op.field)\ 

} 

if {opJ)its & reLaddrJtit) 

(Assemble YZ as a relative address and goto assemble-X 126); 

} 

if {vaLstack[l\.equiv .hW vaLstack[l\.equiv .1 > *f±fi) 
err ( " *YZuf ieldudoesn ’ tuf ituinutwoubytes " ) ; 
yz = vaLstack[l].equiv .1 
goto assemble.X ; 

This code is used in section 116. 

125. (Assemble YZ as a future reference and goto assemble^X 125 ) = 

{ 

pp = vaLstack[l\.link-‘sym\ 
qq = new.sym.node {false); 
qq^link = pp^link; 
ppJink = qq; 
qq-senal = fix.yz ; 
qq-equiv = cur Joe; 
yz = 0; 

future Jits = *c0; 
goto assembleJX ; 

} 

This code is used in section 124. 

126. (Assemble YZ as a relative address and goto assembleJX 126) = 

{ 

octa source , dest ; 

if {vaLstack[l].equiv .1 & 3) err("*relativeuaddressuisunotudivisibleubyu4"); 

source = shift jriqht {cur Joe, 2,0); 

dest = shift.riqht{vaLstack[l].equiv ,2,0); 

acc = ominus {dest , source); 
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if (-.( 0 CC./ 1 & *80000000)) { 
if {acc.l > *ffff V acc.h) 

err ("relativeuaddressuisunioreuthanu#f ff f utetrabytesuf orward" ); 

} else { 

acc = zncr( ace, * 10000 ); 
opcode ++ ; 

if (acc.l > *ffff V acc.h) 

err ("relativeuaddressuisunioreuthanu#10000utetrabytesubackward" ); 

} 

yz = acc.l', 
goto assembled ; 

} 

This code is used in section 124. 

127. (Assemble YZ as a memory address and goto assemble^X 127 ) = 

{ 

octa o; 

o = vaLstack\l].equiv , k = 0; 
for (j = greg; j < 255; j++) 

if (greg.val\j].hV greg.val[j].l) { 

acc = ominus(vaLstack[l].equiv , greg^val[j\)', 

if (acc.h < o.h A (acc.l < o.l V acc.h < o.h)) o = acc, k = j; 

} 

if (o.l < *ff A -^o.h A k) yz = (k 8) + o.l, opcode++; 

else if (-^expanding) err ("noubaseuaddressuisucloseuenoughutOutheuaddressuA" ) 
else (Assemble instructions to put supplementary data in $255 128 ); 
goto assembled ; 

} 

This code is used in section 124. 



acc: octa, §83. 

assemble.X: label, §119. 

curJoc: octa, §43. 

derr = macro ( ), §45. 

equiv: octa, §82. 

equiv: octa, §58. 

err = macro ( ), §45. 

expanding: int, §139. 

false = 0, §26. 

fix^yz = 1, §58. 

futureJ)its: int, §120. 

greg: int, §143. 

gregjual: octa [], §133. 

h: tetra, §26. 

immedjyit = ^^2, §62. 

incr: octa (), mmix-ARITH §6. 



j: register int, §136. 
k: register int, §136. 

1: tetra, §26. 
link: trie.node *, §82. 
link: sym.node *, §58. 
mem.bit = "^80000, §62. 
new.sym.node: sym.node 
*0, §59. 

octa = struct, §26. 
ominus: octa (), 
MMIX-ARITH §5. 
opJ)its: tetra, §105. 
op^field: Char §33. 
opcode: tetra, §105. 
pp: register sym.node *, 
§65. 



pure = 0, §82. 

qq: register sym.node *, 

§65. 

reg.val = 1, §82. 
reLaddr^bit = ^1, §62. 
serial: int, §58. 

SET = ^100, §62. 
shiftjright: octa (), 
MMIX-ARITH §7. 
status: stat, §82. 
sym: sym.node *, §54. 
undefined =2, §82. 
vaLstack: val.node *, §83. 
yz: tetra, §120. 
yzar^bit = "^100, §62. 
yzr.bit = *200, §62. 
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128. #define SETH *eO 
#define SETL *e3 
^define ORH *e8 
:j^deflne ORL *eb 

( Assemble instructions to pnt supplementary data in $255 128 ) = 

{ 

for {j = SETH; j < ORL; j++) { 
switch (j & 3) { 

case 0: yz = o.h ^ 16; break; /* SETH */ 
case 1: yz = break; /* SETMH or ORMH */ 

case 2: yz = o.l ^ 16; break; /* SETML or ORML */ 
case 3: yz = o.l break; /* SETL or ORL */ 

} 

if {yz y 3 = SETL) { 

assemble {i, {j 24) + (255 41 16) + yz,0); 

j 1= ORH; 

} 

} 

if (k) yz = {k ^ 8) + 255; /* Y = Sfc, Z = $255 */ 

else yz = 255 <4 8, opcode ++\ /* Y = $255, 'L = D *j 

} 

This code is used in section 127. 

129. (Do a one-operand operation 129 ) = 
if {vaLstack^]. status = undefined) { 

if {opJ)its & reLaddrJiit) 

{ Assemble XYZ as a future reference and goto assembleAnst 130 ) 
else if (opcode 7 ^ PREFIX) err ("theuoperanduisuundef ined" ); 

} else if {val.stack^]. status = regjual) { 
if {-i(opJ)its & {xyzmbit + xyzar.bit))) 

derr ( " *operanduof u ‘ ’/.s ’ ushouldunot ubeuUur egist erunumber " , op.field ) ; 
} else { /* vaLstack[0].status = pure */ 

if {opJ)its & xyzrAit) 

derr ( " *operanduof □ ‘ 7.s ’ ushouldubeuauregist erunumber " , op.field ) ; 
if {op Aits & reLaddrAit) 

(Assemble XYZ as a relative address and goto assembleAnst l3l); 

} 

if {opcode > *±f) (Do a pseudo-operation and goto bypass 132 ); 
if {vaLstack[0].equiv .hy vaLstack[0].equiv .1 > *ffffff) 
err ( " *XYZuf ieldudoesn ’ tuf ituinuthreeubytes " ) ; 
xyz = vaLstack[0].equiv .1 
goto assembleAnst', 

This code is used in section 116. 

130. (Assemble XYZ as a future reference and goto assembleAnst 130 ) = 

{ 

pp = vaLstack[0].link-‘sym-, 
qq = new.sym.node {false); 
qqAink = ppAink; 
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ppMink = qq-, 
qq-'serial = fixjcyz ; 
qq-’equiv = cur Joe; 
xyz = 0; 

future.bits = *eO; 
goto assemble Jnst; 

} 

This code is used in section 129. 

131. (Assemble XYZ as a relative address and goto assembleAnst 131 ) = 

{ 

octa source, dest; 

if {vaLstack[0].equiv .1 & 3) err("*relativeuaddressuisunotudivisibleubyu4"); 
source = shift.right {cur Joe, 2,0); 
dest = shift.right{vaLstack[0].equiv ,2,0); 
acc = ominus {dest , source); 
if {-^{acc.h&c *80000000)) { 
if {acc.l > V acc.h) 

err ("relativeuaddressuisuinoreuthanu#f f f ff f utetrabytesuf orward" ); 

} else { 

acc = mcr( acc, *1000000); 
opcode ++ ; 

if {acc.l > V acc.h) 

err ("relativeuaddressuisuinoreuthanu#1000000utetrabytesubackward" ); 

} 

xyz = acc.l; 
goto assemblejnst; 

} 

This code is used in section 129. 



acc: octa, §83. 
assemble: void (), §52. 
assembleSnst: label, §119. 
bypass: label, §102. 
curJoc: octa, §43. 
derr = macro ( ), §45. 
equiv: octa, §82. 
equiv: octa, §58. 
err = macro ( ), §45. 
false = 0, §26. 
fix^xyz = 2, §58. 
futureJ)its: int, §120. 
h: tetra, §26. 

incr: octa (), mmix-ARITH §6. 
j: register int, §136. 
k: register int, §136. 



1: tetra, §26. 
link: trie.node *, §82. 
link: sym.node *, §58. 
new.sym.node: sym.node 
*0, §59. 
o: octa, §127. 
octa = struct, §26. 
ominus: octa (), 
MMIX-ARITH §5. 
opJjits: tetra, §105. 
op^field: Char *, §33. 
opcode: tetra, §105. 
pp: register sym.node *, 
§65. 

PREFIX = =^103, §62. 
pure = 0, §82. 



qq: register sym.node *, 

§65. 

reg.val = 1, §82. 
reLaddr^bit = "^1, §62. 
serial: int, §58. 
shiftjright: octa {), 

MMIX-ARITH §7. 
status: stat, §82. 
sym: sym.node *, §54. 
undefined =2, §82. 
vaLstack: vaLnode *, §83. 
xyz: tetra, §120. 
xyzar.bit = §62. 

xyzrJjit = §62. 

yz: tetra, §120. 
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132. (Do a pseudo-operation and goto bypass 132 ) = 
switch ( opcode ) { 

case LOG: cur Joe = riaLstacA: [0] . egwin ; 
case IS: goto bypass-, 

case PREFIX: if {-^vaLstack[0].link) err ("notyauvalidupref ix" ); 

cur ^prefix = vaLstack[Q\.link-, goto bypass; 
case GREG: if {listing. file) (Make listing for GREG 134); 
goto bypass; 

case LOCAL: if (vaLstack[0].eguiv .1 > Ireg) Ireg = val.stacklfi)\.equiv .1; 
if (listing.file) { 

fprintf {listing.file , " ($"/,03d) " , vaLstack[0].equiv .1); 
flush JistingJme ( " uuuuuuuuuuuuu " ) j 

} 

goto bypass; 

case BSPEC: if {vaLstack[0].equiv .1 > *ffff V vaLstack[0].equiv .h) 
err ( " *operanduof u ‘ BSPEC ’ udoesn ’ tuf ituinutwoubytes " ) ; 
mmojoc ( ) ; mmo.sync ( ) ; 
mmojopp {lop.spec , vaLstack [0] . equiv .1); 
spec.mode = true; spec.modejoc = 0; goto bypass; 
case ESPEC: spec.mode = false; goto bypass; 

} 

This code is used in section 129. 

133. (Global variables 27 ) -|-= 

octa greg.val[256]; /* initial values of global registers */ 

134. (Make listing for GREG 134 ) = 

if {vaLstack[0].equiv .1 V vaLstack[0].equiv .h) { 

fprintf {listing.file, " ($"/,03d=#"/,08x" , cur.greg, vaLstack [0], equiv .h); 
flush JistingJine ( " uuuu " ) ! 

fprintf {listing.file, "uuuuuuuuu’/.OSx) " , vaLstack [0], equiv .1); 
flush JistingJine ( " u " ) ; 

} else { 

fprintf {listing.file , " ($"/,03d) " , cur.greg); 
flush JistingJine ( " uuuuuuuuuuuuu " ) ! 

} 

This code is used in section 132. 
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135. Running the program. On a UNIX-like system, the command 
mmixal [options] sourcef ilename 

will assemble the MMIXAL program in file sourcef ilename, writing any error messages 
on the standard error file. (Nothing is written to the standard output.) The options, 
which may appear in any order, are: 

• -o objectf ilename Send the output to a binary file called objectf ilename. If 
no -o specification is given, the object file name is obtained from the input file name 
by changing the final letter from ‘s’ to ‘o’, or by appending ‘ .mmo’ if sourcef ilename 
doesn’t end with s. 

• -1 listingname Output a listing of the assembled input and output to a text file 
called listingname. 

• -X Expand memory-oriented commands that cannot be assembled as single in- 
structions, by assembling auxiliary instructions that make temporary use of global 
register $255. 

• -b bufsize Allow up to bufsize characters per line of input. 



BSPEC = #104, §62. 
bypass: label, §102. 
cur^greg: int, §143. 
curJoc: octa, §43. 
curjprefix: trie_node *, §56. 
equiv: octa, §82. 
err = macro ( ), §45. 

ESPEC = =^105, §62. 
false = 0, §26. 

flush JistingJine: void (), §41. 
fprintf: int (), <stdio.h>. 



GREG = =^106, §62. 
h: tetra, §26. 

13 = ^101, §62. 

1 : tetra, §26. 
link: trie.node *, §82. 
listing^file: FILE *, §139. 
LOG = ^102, §62. 

LOCAL = ^107, §62. 
lop^spec = §24. 

Ireg: int, §143. 



mmoJoc: void (), §49. 
mmoAopp: void (), §48. 
mmo.sync: void (), §50. 
octa = struct, §26. 
opcode: tetra, §105. 

PREFIX = ^103, §62. 
spec.mode: bool, §43. 
spec.modeJ.oc: tetra, §43. 
true = 1, §26. 

vaLstack: vaLnode *, §83. 
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136. Here, finally, is the overall structure of this program. 

T^include <stdio.h> 

^include <stdlib.h> 

T^include <ctype.h> 

^include <string.h> 

T^include <time.h> 

( Preprocessor definitions 31 } 

( Type definitions 26 ) 

( Global variables 27 ) 

( Subroutines 28 ) 

int main{argc, argv) 

int argc\ char *argv\\\ 

{ 

register int j, k\ /* all-purpose integers */ 

( Local variables 40 ) ; 

(Process the command line 137); 

( Initialize everything 29 ) ; 
while (1) { 

( Get the next line of input text, or break if the input has ended 34); 
while (1) { 

( Process the next MMIXAL instruction or comment 102 ) ; 
if {^*buf-ptr) break; 

} 

if {listing. file) { 

if ( listing-bits ) listing-dear ( ) ; 

else if {^line-listed) flush-listing-line{"uuuuuuuuuuuuuuuuuuu")', 

} 

} 

(Finish the assembly 142); 

} 

137. The space after "-b" is optional, because MMIX-SIM does not use a space in 
this context. 

( Process the command line 137 ) = 

for (j = 1; j < argc - 1 A argv\j][Q] = j-H-) 

if {-nargv[j][2]) { 

if {argv[j\[l] = ’x’) expanding = 1; 

else if {argv[j]\l\= ’o’) j++, strcpy{obj-file-name, argv\j])-, 
else if {argv[j]\l\= ’!’) j++ , strcpy {listing-name, argv[j])\ 
else if {argv[j]\l\ = ’b’ A sscanf {argv[j -\- 1], , &ibufsize) = 1) j++; 

else break; 

} else if {argv[j\[l] ’b’ V sscanf {argv [j] -|- 2, "7,d" , &ibufsize) 1) break; 
if {j yf argc - 1) { 

fprintf {stderr , "Usage :u"/iSu’/.SuSourcef ilename\n" , argv'fd], 

" [~x] u [-lulistingname] □ [-bybuf f ersize] □ [-Oyobjectf ilename] " ); 
exit{—l)-, 

} 
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src.file.name = argv[j]; 

This code is used in section 136. 

138. ( Open the files 138) = 
src.file = fopen{src-file.name, "r"); 

if {-^src.file) (fpanic (" Can ’tuopenutheuSourceufileu°/.s" , src-file.name); 
if {-^obj-file-name[0]) { 
j = strlen{src.file.name)', 
if {src-file.name\j — 1] = ’s’) { 

strcpy [obj^filejname , src^filejaame)-, obj.file-name[j — 1] = ’o’; 

} 

else sprintf [obj^filejaame , " °/,s . mmo " , src-file-name ) ; 

} 

obj-file = f open {obj -file juame , "wb" ) ; 

if {-^obj-file) rfpanic("Can’tuopenutheuobjectuf ileu’/.s" , obj-file-name); 
if {listing-name[0]) { 

listing-file = f open {listing -name, "w"); 

if {-^listing-file) iipanic("Can’tuopenutheulistinguf ileu’/.s" , listing -name); 

} 

This code is used in section 140. 

139. (Global variables 27) += 

char *src- file-name; /* name of the MMIXAL input Hie */ 

char o6j_/ife_na?Tie[FILENAME_MAX + 1]; /* name of the binary output file */ 

char /istmj.nome [FILENAME_MAX + 1]; /* name of the optional listing file */ 

FILE ^src.file, ^obj.file, ^listing^file', 

int expanding', /* are we expanding instructions when base address fail? */ 
int buf.size; /* maximum number of characters per line of input */ 

140. (Initialize everything 29 } += 

( Open the files 138 ); 
filename[0] = src.file.name-, 
filename.count = 1; 

( Output the preamble 141 ); 

141. (Output the preamble I4i) = 
mmoJop{lop.pre, 1, 1); 
mmo.tetra ( time ( A) ) ; 
mmo.cur^file = — 1; 

This code is used in section 140. 



bufjptv. Char *, §33. 
dpanic = macro ( ), §45. 
exit: void (), <stdlib.h>. 
FILE, <stdio.h>. 
filename: Char *[], §37. 
filename.count: int, §37. 
FILENAME_MAX = macro, 
<stdio.h>. 

flush JistingJine: void {), §41. 



fopen: FILE *(), <stdio.h>. 
fprintf: int (), <stdio.h>. 
lineJisted: bool, §36. 
listingjDits: unsigned char, 
§43. 

listing.clear: void (), §44. 
lop^pre =^9, §24. 
mmo.cur.file: int, §51. 



mmoAop: void (), §48. 
mmo^tetra: void (), §48. 
sprintf: int {), <stdio.h>. 
sscanf: int (), <stdio.h>. 
stderr: FILE *, <stdio.h>. 
strcpy: char *(), <string.h>. 
strlen: size.t (), <string.h>. 
time: time.t (), <time.h>. 
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142. ( Finish the assembly 142 ) = 
if {Ireg > greg) 

rfpamc("Danger :uMustureduceutheunumberuofuGREGsubyu’/od" , Ireg — greg + 1); 
( Output the postamble 144 ); 

( Check and output the trie 80 ) ; 

(Report any undefined local symbols 145); 
if (err^count) { 

if {err.count > 1) /prmt/(stderr, " (’/.duerrorsuwereufound. )\n" , err.count); 
else fprintf {stderr , " (Oneuerroruwasufound. )\n" ); 

} 

exit {err. count)-, 

This code is used in section 136. 

143. (Global variables 27 } += 

int greg — 255; /* global register allocator */ 

int cur. greg-, /* global register just allocated */ 

int Ireg = 32; /* local register allocator */ 

144. ( Output the postamble 144 } = 
mmo.lop{lop.post , 0, greg)-, 

greg.val[255] = trie.search{trie.root, "Main" ; 
for {j = greg-, j < 256; j++) { 
mmo.tetra {greg.val [j]-h ) ; 
mmo.tetra {greg.val [y] .Z) ; 

} 

This code is used in section 142. 

145. (Report any undefined local symbols 145 } = 
for {j = 0; j < 10; j++) 

if {forward.local[j].link) 

err. count ++, fprintf {stderr , "undef inedulocalusymbolu°/.dF\n" , j); 

This code is used in section 142. 
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dpanic = macro ( ), §45. 
equiv: octa, §58. 
err^count: int, §46. 
exit: void (), <stdlib.h>. 
forwardHocal: sym.node [], 
§90. 

fprintf: int (), <stdio.h>. 



greg.val: octa [], §133. 
h: tetra, §26. 
j: register int, §136. 

1: tetra, §26. 
link: sym.node *, §58. 
lop^post =^^a, §24. 
mmo.lop: void (), §48. 



mmo.tetra: void (), §48. 
stderr: FILE *, <stdio.h>. 
sym: sym.node *, §54. 
triejroot: trie.node *, §56. 
trie^search: trie.node *(), 
§57. 
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Used in section 127. 
130 ) Used in sec- 



Used in section 124. 
Used in section 124. 
Used in section 124. 



Used in section 117. 



146. Names of the sections. 

( Align the location pointer 107 ) Used in section 102. 

( Allocate a global register lOS ) Used in section 102. 

( Assemble instructions to put supplementary data in $255 12 S ) 

( Assemble XYZ as a future reference and goto assemble-inst 
tion 129. 

( Assemble XYZ as a relative address and goto assemble-inst i3i ) Used in sec- 
tion 129. 

( Assemble YZ as a future reference and goto assemble-X 125 ) 

( Assemble YZ as a memory address and goto assemble-X 127 ) 

( Assemble YZ as a relative address and goto assemble-X 126 ) 

( Cases for binary operators 99 , 101 ) Used in section 98. 

( Cases for unary operators 100 ) Used in section 98. 

( Check and output the trie so ) Used in section 142. 

( Check for a line directive as ) Used in section 34. 

( Copy the operand field IO 6 ) Used in section 102. 

(Deal with cases where vaLstack[j] is impure iis) 

( Define the label 109 ) Used in section 102. 

(Do a many-operand operation 117 ) Used in section 116. 

(Do a one-operand operation 129 ) Used in section 116. 

( Do a pseudo-operation and goto bypass 132 ) Used in section 129. 

(Do a three-operand operation 119 ) Used in section 116. 

( Do a two-operand operation 124 ) Used in section 116. 

( Do the operation II 6 ) Used in section 102. 

( Do the X field 123 ) Used in section 119. 

( Do the Y field 122 ) Used in section 119. 

( Do the Z field 121 ) Used in section 119. 

( Encode the length of t-^sym^equiv 76 ) Used in section 74. 

( Find the symbol table node, pp 111 ) Used in section 109. 

( Finish the assembly 142 ) Used in section 136. 

(Fix a future reference from a relative address 114 ) Used in section 112. 

(Fix a future reference from an octabyte 113 ) Used in section 112. 

(Fix prior references to this label 112 ) Used in section 109. 

(Fix references that might be in the vaLstack 110 ) Used in section 109. 

( Flush the excess part of an overlong line 35 ) Used in section 34. 

(Get the next line of input text, or break if the input has ended 34 ) 
section 136. 

( Global variables 27 , 33 , 36, 37 , 43 , 46, 51 , 56, 60 , 63, 67, 69, 77, 83, 90, 105, 120 , 133, 139, 143) 
Used in section 136. 

(Initialize everything 29 , 32 . 6 I, 71 , 84, 91 , 14o) Used in section 136. 

( Local variables 40, 65 ) Used in section 136. 

( Make listing for GREG 134 ) Used in section 132. 

( Make special listing to show the label equivalent 115 ) Used in section 109. 

( Make sure cur Joe and mmo-curjoc refer to the same tetrabyte 53 ) Used in 
section 52. 



Used in 
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( Open the files 13S ) Used in section 140. 

( Output the postamble 144 ) Used in section 142. 

( Output the preamble 141 ) Used in section 140. 

( Perform the top operation on opstack 9S ) Used in section 85. 

( Preprocessor definitions 31, 39 ) Used in section 136. 

( Print symbol syni-buf and its equivalent 78 ) Used in section 75. 

(Process the command line 137) Used in section 136. 

( Process the next MMIXAL instruction or comment 102 ) Used in section 136. 

(Put other predefined symbols into the trie 70) Used in section 61. 

( Put the MMIX opcodes and MMIXAL pseudo-ops into the trie 64 ) Used in section 61. 
( Put the special register names into the trie 66 ) Used in section 61. 

( Report an undefined symbol 79 ) Used in section 74. 

( Report any undefined local symbols 145 ) Used in section 142. 

( Scan a backward local 89 ) Used in section 86. 

( Scan a binary operator or closing token, rt-op 97) Used in section 85. 

( Scan a character constant 92 ) Used in section 86. 

( Scan a decimal constant 94 ) Used in section 86. 

( Scan a forward local 88 ) Used in section 86. 

( Scan a hexadecimal constant 95 ) Used in section 86. 

( Scan a string constant 93 ) Used in section 86. 

( Scan a symbol 87 ) Used in section 86. 

( Scan opening tokens until putting something on vaLstack 86 ) Used in section 85. 

( Scan the current location 96 ) Used in section 86. 

( Scan the label field; goto bypass if there is none 103 ) Used in section 102. 

(Scan the opcode field; goto bypass if there is none 104) Used in section 102. 

( Scan the operand field 85 ) Used in section 102. 

( Subroutines 28, 41, 42, 44, 45, 47, 48, 49, 50, 52, 55, 57, 59, 73, 74 ) Used in section 136. 

( Type definitions 26, 30, 54, 58, 62, 68, 82 ) Used in section 136. 

( Visit t and traverse t^mid 75 ) Used in section 74. 



MMMIX 

1. Introduction. This CWEB program simulates how the MMIX computer might 
be implemented with a high-performance pipeline in many different configurations. 
All of the complexities of MMIX’s architecture are treated, except for multiprocessing 
and low-level details of memory mapped input/output. 

The present program module, which contains the main routine for the MMIX meta- 
simulator, is primarily devoted to administrative tasks. Other modules do the actual 
work after this module has told them what to do. 

2. A user typically invokes the meta-simulator with a UNIX-like command line of 
the general form ‘mmmix conf igf ile progf ile’, where the conf igf ile describes the 
characteristics of an MMIX implementation and the progf ile contains a program to 
be downloaded and run. Rules for configuration files appear in the module called 
mmix-config. The program file is either an “MMIX binary file” dumped by MMIX- 
SIM, or an ASCII text file that describes hexadecimal data in a rudimentary format. 
It is assumed to be binary if its name ends with the extension ‘ .mmb’. 

T^include <stdio.h> 

^include <stdlib.h> 

^include <string.h> 

^include "mmix-pipe.h" 

char *config.file.name, *prog.file.name\ 

{ Global variables 5 ) 

( Subroutines 10 ) 
int main{argc, argv) 
int argc\ 
char *argv[]; 

{ 

( Parse the command line 3 } ; 

MMIX.config ( config.file.name ) ; 

MMIXJnit { ); 
mmtxJ,ojinit { ); 

( Input the program 4 ) ; 

( Run the simulation interactively 13 ) ; 
print/ ("Simulationuendeduatutimeu°/.d.\n" , ticks. 1); 
print.stats ( ) ; 
return 0; 

} 

3. The command line might also contain options, some day. For now I’m forgetting 
them and simplifying everything until I gain further experience. 

( Parse the command line 3 ) = 
if {argc 7 ^ 3) { 

fprintf {stderr , "Usage ruXsuConf igf ileuprogfile\n" , argv [0]); 
eait (— 3); 

} 

config.file.name = argv\l\, 



D.E. Knuth: MMIXware, LNCS 1750, pp. 494—509, 2014. 

DOI: 10.1007/3-540-46611-8_10 © Author and Springer- Verlag Berlin Heidelberg 2014 
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prog.file.name = argv[2]; 

This code is used in section 2. 

4. (Input the program 4) = 

if {strlen{prog.file.name) > 4 A strcmp{prog.file.name + strlen(prog.file.name) — 4, 
" .mmb" ) = 0) ( Input an MMIX binary file 9 ) 
else (Input a rudimentary hexadecimal file e); 
fdose{prog_file); 

This code is used in section 2. 



exit: void (), <stdlib.h>. 
fclose: int (), <stdio.h>. 
fprintf: int (), <stdio.h>. 
1: tetra, mmix-pipe §17. 
MMIX.config: void (), 
MMIX-CONFIG §38. 
MMIXAnit: void (), 



MMIX-PIPE §10. 
mmix.io.init: void (), 
MMIX-IO §7. 
print.stats: void (), 
MMIX-PIPE §162. 
print/: int (), <stdio.h>. 



prog^file: FILE §5. 
stderr: FILE <stdio.h>. 
strcmp: int (), <string.h>. 
strlen: size.t (), <string.h>. 
ticks: Extern octa, 
MMIX-PIPE §87. 
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5. Hexadecimal input to memory. A rudimentary hexadecimal input format 
is implemented here so that the simulator can be run with essentially arbitrary data 
in the simulated memory. The rules of this format are extremely simple: Each line 
of the file either begins with (i) 12 hexadecimal digits followed by a colon; or (ii) a 
space followed by 16 hexadecimal digits. In case (i), the 12 hex digits specify a 48-bit 
physical address, called the current location. In case (ii), the 16 hex digits specify an 
octabyte to be stored in the current location; the current location is then increased 
by 8. The current location should be a multiple of 8, but its three least significant 
bits are actually ignored. Arbitrary comments can follow the specification of a new 
current location or a new octabyte, as long as each line is less than 99 characters long. 
For example, the file 



0123456789ab: SILLY EXAMPLE 

0123456789abcdef first octabyte 
f edbca9876543210 second 

places the octabyte ’^0123456789abcdef into memory location *0123456789a8 and 
edcba9876543210 into location *0123456789b0. 

#deflne BUF_SIZE 100 

( Global variables 5 ) = 
octa cur Joe; 
octa cur^dat; 
bool new-chunk; 
char bttjffer [BUF_SIZE]; 

FILE *prog-file; 

See also sections 16 and 25. 

This code is used in section 2. 

6 . (Input a rudimentary hexadecimal file 6) = 

{ 

prog. file = fopen{prog.file.name, "r"); 

if i^prog.file) { 

fprintf (stderr , "Panic : uCan’tuopenuMMIXuhexadecimaluf ileu°/oS ! \n" , 
prog.file.name ) ; 

exit{—3); 

} 

new-chunk = true; 

while (1) { 

if {-ifgets {buffer ,B\JF_S1ZE, prog.file)) break; 

if {buffer[strlen{buffer) — 1] ^ ’\n’) { 

fprintf {stderr , "Panic : uHexadecimaluf ileulineutooulong: □ ‘‘/.s . . . ’ !\n", buffer); 
exit {—3); 

} 
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if {buffer [12] = ’ (Change the current location 7) 

else if {buffer [0] = ’□’) (Read an octabyte and advance cur Joe s) 

else { 

fprintf {stderr , "Panic : ulmproperuhexadecimaluf ileuline : u ‘°/.s ’ ! \n" , buffer)] 
exit {—3)] 

} 

} 

} 

This code is used in section 4. 

7 . ( Change the current location 7 ) = 

{ 

if {sscanf {buffer ,"°/A^7,8 x'' ,& z,cur Joe. h,&icur Joe. 1) ^ 2) { 

fprintf {stderr , "Paniciulmproperuhexadecimaluf ileulocation:u‘’/.s’ ! \n" , buffer)] 
exit {—3)] 

} 

neui.ehunk = true] 

} 

This code is used in section 6. 

8. (Read an octabyte and advance cur doc s) = 

{ 

if {sscanf {buffer + 1, "7,8x‘/,8'x." , &icur_dat .h, &ccur_dat .1) 7^ 2) { 

fprintf {stderr , "Panic : ulmproperuhexadecimaluf ilsudata: □ ‘/.s ’ ! \n" , buffer)] 
exit {—3)] 

} 

if {neui.ehunk) mem.write{curJoc, cur.dat)] 

else mem.hash[lasCh].chunk[{curJoe.l & S> 3] = cur.dat] 

curjoe.l += 8; 

if {{cur Joe. I fits) 0) new-chunk = false] 

else { 

new-chunk = true] 

if ((citr_/oc.l & *ff f f 0000) = 0) curJoc.h++] 

} 

} 

This code is used in section 6. 



bool = enum, mmix-pipe §11. 
chunk: octa *, mmix-pipe §206. 
exit: void (), <stdlib.h>. 
false=0, MMIX-PIPE§11. 

/gets: char *(), <stdio.h>. 

FILE, <stdio.h>. 

fopen: FILE +(), <stdio.h>. 



fprintf: int {), <stdio.h>. 
h: tetra, mmix-pipe §17. 

1: tetra, mmix-pipe §17. 
last^h: int, mmix-pipe §211. 
memJiash: chunknode *, 
MMIX-PIPE §207. 
mem.write: void (), 



MMIX-PIPE §213. 
octa = struct, mmix-pipe §17. 
prog^file^name , §2. 
sscanf: int (), <stdio.h>. 
stderr: FILE *, <stdio.h>. 
strlen: size.t (), <string.h>. 
true = l, MMIX-PIPE§11. 
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9. Binary input to memory. When the program file was dumped by MMIX- 
SIM, it has the simple format discussed in exercise 1.4.3'-20 of the MMIX fascicle 
[The Art of Computer Programming, Volume 1, Fascicle Ij. We assume that such a 
program has text, data, pool, and stack segments, as in the conventions of that book. 
We load it into four 2^^-byte pages of physical memory, one for each segment; page 
zero of segment i is mapped to physical location 2^‘^i. Page tables are kept in physical 
locations starting at 2^^ x 4; static traps begin at 2^^ x 5 and dynamic traps at 2^^ x 6. 
(These conventions agree with the special register settings rT = *8000000500000000, 
rTT = *8000000600000000, rV = *369c200400000000 assumed by the stripped- 
down simulator.) 

( Input an MMIX binary file 9 ) = 

{ 

prog.file = fopen{prog_file.name, "rb"); 
if {-^prog.file) { 

fprintf {stderr , "Panic : uCan’tuopenuMMIXubinaryuf ileu'Xs ! \n" , prog. file jname)\ 
exit{—3)-, 

} 

while (1) { 

if (-iundump.octa()) break; 
new.chunk = true; 
cur Joe = cur.dat-, 

if ( cur_/oc./i & *9fffffff ) bad.address = true, 
else bad.address = false, cur Joe. h ~;$>= 29; 

/* apply trivial mapping function for each segment */ 

(Input consecutive octabytes beginning at cur Joe ll); 

} 

( Set up the canned environment 12 ) ; 

} 

This code is used in section 4. 

10. The undump.octa routine reads eight bytes from the binary file prog.file into 
the global octabyte cur.dat, taking care as usual to be big-endian regardless of the 
host computer’s bias. 

( Subroutines 10 } = 

static bool undump.octa ARGS((void)); 
static bool undump.octa ( ) 

{ 

register int tO , tl , t2, tS; 
to = fgetc{prog.file); if (tO = EOF) return false, 
tl = fgetc{prog.file); if (tl = EOF) goto oops; 

t2 = fgetc[prog.fHe)\ if {t2 =E0F) goto oops; 

t3 = fgetc{prog.file)-, if {t3 = EOF) goto oops; 

cur.dat.h — {tO <C 24) -|- {tl <C 16) -I- {t2 <C 8) -I- 
to = fgetc {prog. file)-, if {tO = EOF) goto oops; 

tl = fgetc{prog.file)-, if {tl = EOF) goto oops; 

t2 = fgetc{prog.file)-, if {t2 =E0F) goto oops; 
t3 = fgetc{prog.file)-, if {t3 =E0F) goto oops; 
cur.dat.l = {tO <C 24) -|- {tl <C 16) -I- {t2 <C 8) -I- 
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return true, 

oops: fprintf {stderr , "Prematureuenduofuf ileuonu°/.s ! \n" , prog.filejname)', 
return false; 

} 

See also sections 17 and 20. 

This code is used in section 2. 

11. (Input consecutive octabytes beginning at cur Joe ii) = 
while (1) { 

if {-nundump.octa{)) { 

fprintf {stderr , "Unexpecteduenduofuf ileuonu7«s ! \n" , prog.file.name); 

break; 

} 

if {-^{cur.dat .h V cur.dat .1)) break; 
if (bad^address) { 

fprintf {stderr , "Panic : uUnsupporteduvirtualuaddressu°/o08x°/,08x ! \n" , curJoe.h, 
cur Joe. 1); 
exit {—5); 

} 

if {new.chunk) mem.write{curJoc, cur.dat); 

else mem.hash[last.h].chunk[{curJoc.l & 3] = cur^dat; 

curjoc.l += 8; 

if {{curjoc.l fits) 7 ^ 0) new.chunk = false; 
else { 

new.chunk — true; 
if ((ctir.ioc.Z & *f f ff 0000 ) = 0) { 
bad.address = true; 
curJoe.h = {curJoc.h 29) + 1; 

} 

} 

} 

This code is used in section 9. 



ARCS = macro ( ), mmix-pipe §6. 
bad^address bool, §25. 
bool = enum, mmix-pipe §11. 
chunk: octa *, mmix-pipe §206. 
cur^dat: octa, §5. 
cur Joe: octa, §5. 

E0F = ( — 1), <stdio.h>. 
exit: void (), <stdlib.h>. 



false =0, MMIX-PIPE §11. 
fgetc: int (), <stdio.h>. 
fopen: FILE *(), <stdio.h>. 
fprintf: int (), <stdio.h>. 
h: tetra, mmix-pipe §17. 

1 : tetra, mmix-pipe §17. 
last.h: int, mmix-pipe §211. 
memJiash: chunknode *, 



MMIX-PIPE §207. 
mem^write: void (), 
MMIX-PIPE §213. 
new.chunk: bool, §5. 
prog^file: FILE =i=, §5. 
prog^file^name , §2. 
stderr: FILE *, <stdio.h>. 
true = l, MMIX-PIPE§11. 
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12. The primitive operating system assumed in simple programs of The Art of 
Computer Programming will set up text segment, data segment, pool segment, and 
stack segment as in MMIX-SIM. The runtime stack will be initialized if we UNSAVE 
from the last location loaded in the .mmb file. 

^define rQ 16 

( Set up the canned environment 12 ) = 
if {cur Joe. h / 3) { 

fprintf {stderr , "Panic : uMMIXubinaryuf ileudidn’ tusetuuputheustack ! \n" ); 
exit{—6)-, 

} 

insEptr .o = mem_read{incr{curJoc, —8 * 14)); /* Main */ 

instjptr.p = A; 

cur Joe. h — *60000000; 

(?[255]. o = mcr(cur_/oc, —8); /* place to UNSAVE */ 

cur.dat.l = *90; 

if {memjread{cur^dat).h) insRptr.o = cur.dat; /* start at *90 if nonzero */ 
head^inst = (UNSAVE <C 24) + 255, tail — ; /* prefetch a fabricated command */ 

head-loc = incr {instjptr .o, —4); /* in case the UNSAVE is interrupted */ 

g[rT].o.h = *80000005, g[rrrj.o.h = *80000006; 

cur.dat.h = (RESUME ^ 24) + 1, cur.dat.l = 0, curJoc.h = 5, curJoc.l — 0; 
memjwrite{curJoc,cur^dat); /* the primitive trap handler */ 
cur.dat.l = cur.dat.h, cur.dat.h = (NEGI <C 24) + (255 <C 16) + 1; 
curJoc.h — 6, curJoc.l — 8; 

mem.write{curJoc, cur.dat); /* the primitive dynamic trap handler */ 
cur.dat.h = (GET <C 24) + rQ, cur.dat.l = (PUTI <t. 24) + {rQ <t. 16), curJoc.l — 0; 
mem.write{curJoc, cur.dat); /* more of the primitive dynamic trap handler */ 
cur.dat.h = 0, cur.dat.l = 7; /* generate a PTE with rwx permission */ 

curJoc.h = 4; /* beginning of skeleton page table */ 

memjwrite{curJoc,cur^dat); /* PTE for the text segment */ 

ITcacheset^]^].tag = zero-octa; 

/rcocAe"^set[0][0].data[0] = cur.dat; /* prime the IT cache */ 
cur.dat.l = 6; /* PTE with read and write permission only */ 

cur.dat.h = 1, curJoc.l = 3 <C 13; 

mem.write{curJoc, cur^dat); /* PTE for the data segment */ 
cur.dat.h = 2, curJoc.l = 6 <C 13; 

mem_write{curJoc, cur.dat); /* PTE for the pool segment */ 
cur.dat.h = 3, curJoc.l = 9 ^ 13; 

mem_write{curJoc, cur.dat); /* PTE for the stack segment */ 
g[rK].o = neg.one; /* enable all interrupts */ 
g[rV].o.h = *369c2004; 

pagejiad = false , page.r = 4 <C (32 — 13), pagers = 32, page^mask. I — *ffffffff ; 
pagej)[l\ = 3, pageA[2] = 6, page-b[3] = 9,page_6[4] = 12; 

This code is used in section 9. 
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cur.dat: octa, §5. 

cur Joe: octa, §5. 

data: octa *, mmix-pipe §167. 

exit: void (), <stdlib.h>. 

false=0, MMix-PiPE§ll. 

fprintf: int (), <stdio.h>. 

g: int, mmix-pipe §167. 

GET = ^fe, MMIX-PIPE §47. 
h: tetra, mmix-pipe §17. 
head: fetch *, mmix-pipe §69. 
incr: octa (), mmix-ARITH §6. 
inst: tetra, mmix-pipe §68. 
insUptr: spec, mmix-pipe §284. 
ITcache: cache *, 

MMIX-PIPE §168. 

1 : tetra, mmix-pipe §17. 



loc: octa, MMIX-PIPE §44. 
mem.read: octa (), 

MMIX-PIPE §210. 
mem.write: void (), 
MMIX-PIPE §213. 
neg^one: octa, mmix-ARITH §4. 
NEGI = ^35, MMIX-PIPE §47. 
o: octa, MMIX-PIPE §40. 
p: specnode *, mmix-pipe §40. 
page^b: int [], mmix-pipe §238. 
pagej>ad: bool, 

MMIX-PIPE §238. 
page.mask: octa, 

MMIX-PIPE §238. 
page^r: int, mmix-pipe §238. 



page.s: int, mmix-pipe §238. 
PUTI =^f7, mmix-pipe §47. 
RESUME = ^f9, MMIX-PIPE §47. 
rK = 15, MMIX-PIPE §52. 
rT = 13, MMIX-PIPE §52. 
tTT = 14, MMIX-PIPE §52. 
rV = 18, MMIX-PIPE §52. 
set: cacheset *, 

MMIX-PIPE §167. 
stderr: FILE =t=, <stdio.h>. 
tag: octa, mmix-pipe §167. 
tail: fetch *, mmix-pipe §69. 
UNSAVE = ^fb, MMIX-PIPE §47. 
zero.octa: octa, 

MMIX-ARITH §4. 
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13. Interaction. When prompted for instructions, this simulator understands 
the following terse commands: 

• (positive integer): Run for this many clock cycles. 

• 0 ( hexadecimal integer ) : Set the instruction pointer to this virtual address; succes- 
sive instructions will be fetched from here. 

• k: Toggle the sign bit of the instruction pointer. 

• b ( hexadecimal integer): Set the breakpoint to this virtual address; simulation will 
pause when an instruction from the breakpoint address enters the fetch buffer. 

• V ( hexadecimal integer): Set the desired level of diagnostic output; each bit in the 
hexadecimal integer enables certain printouts when the simulator is running. Bit 
*1 shows instructions when issued, deissued, or committed; *2 shows the pipeline 
and locks after each cycle; *4 shows each coroutine activation; *8 each coroutine 
scheduling; *10 reports when reading from an uninitialized chunk of memory; *20 
asks for online input when reading from addresses > 2^^; *40 reports all I/O to 
memory address > 2^®; *80 shows details of branch prediction; *100 displays full 
cache contents including blocks with invalid tags. 

• -(integer): Deissue this many instructions. 

• l( integer) or g( integer): Show current “hot” contents of a local or global register. 

• m( hexadecimal integer): Show current contents of a physical memory address. 
(This value may not be up to date; newer valnes might appear in the write buffer 
and/or in the caches.) 

• f ( hexadecimal integer): Insert a tetrabyte into the fetch buffer. (Use with care!) 

• i( integer): Set the interval counter rl to the given value; this will trigger an 
interrupt after the specified number of cycles. 

• IT, DT, I, D, or S: Show current contents of a cache. 

• D* or S*: Show dirty blocks of a cache. 

• p: Show current contents of the pipeline. 

• s: Show current statistics on branch prediction and speed of instruction issue. 

• h: Help (show the possibilities for interaction). 

• q: Quit. 

( Run the simulation interactively 13 ) = 
while (1) { 

prmt/("iraimiix>Li" ); jflush{stdout); 
f gets {buffer ,B\JF _S1ZE , stdin); 
switch {buffer [0]) { 
default : what.say : 

print/ ("Eh?uSorry, uludon’tuunder St and. u(Typeuhuforuhelp) \n" ); 
continue; 

case ’q’: case ’x’: goto done-, 

{ Cases for interaction 14 ) 

} 

} 
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done-. 

This code is used in section 2. 

14. (Cases for interaction 14) = 

case ’h’: case print/ ("Theuinteractiveucommandsuareuasuf ollows : \n" ); 

print/ ("u<n>utOurunuforunucycles\n" ); 

print/ ("u@<x>ut Out akeunextu instruct ionufromuloc at ionux\n" ); 

print/ ("ukuuuutOuchangeutheusignubituofutheuinstructionulocation\n" ); 

print/ (" ub<x>ut Oupauseuwhenulocat ionuXuisuf etched\n" ); 

printf ( "uV<x>utOupr intuspec if iedudiagnost icsywhenurunning ; \n" ) ; 

print/ ("uuuuX=l [instsuenter/leaveupipe] +2 [wholeupipelineusachucycle] +\n" ); 

print/ ("uuuuuu4 [corout ineuactivat ions] +8 [corout inouscheduling] +\n" ); 

print/ ("uuuuuulO [uninit ializeduread] +20 [onlinouI/Ouread] +\n" ); 

print/ ("uuuuuu40 [I/Ouread/write] +80 [branchupredictionudetails] +\n" ); 

print/("uuuuuul00 [invaliducacheublocksudisplayedutoo] \n" ); 

printf ( "u“<n>utOudeissueunu instruct ions\n" ) ; 

print/ ("ul<n>utOuprintuCurrentuvalueuofulocaluregisterun\n" ); 
print/ ("ug<n>utOuprintuCurrentuvalueuofuglobaluregisterun\n" ); 
print/ ("uin<x>utOuprintuCurrentuvalueuofumemoryuaddressux\n" ); 
print/ ("uf<x>utOuinsertuinstructionuXuintOutheufetchubuffer\n" ); 
print/ ("ui<n>utOuinitiateuautimeruinterruptuafterunucycles\n" ); 
printf ( "uIT , uDT , ul > uD > uO^uSutOupr intucurrentucachoucontent s\n" ) ; 
print/ ("uD+uoruS*utOuprintudirtyublocksuofuaucache\n" ); 
printf ( "uputouprintucurrent upipelineuContent s\n " ) ; 
printf ( " uSut Oupr intucurrent ust at s\n" ) ; 
print/ ("uhutOuprintuthisuinessage\n" ); 
print/("uqutOuexit\n" ); 

printf {" (Hereu<n>uisuaudecimaluinteger ,u<x>uisuhexadecimal . ) \n" ); 
continue; 

See also sections 15, 18, 19, 21, 22, 23, and 24. 

This code is used in section 13. 



BUF.SIZE = 100, §5. 
buffer-, char [], §5. 
fflush: int (), <stdlo.h>. 



fgets: char *(), <stdio.h>. 
printf: int (), <stdio.h>. 



stdin: FILE *, <stdio.h>. 
stdout: FILE *, <stdio.h>. 
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15. (Cases for interaction 14) += 

case ’O’: case ’ 1 ’ : case ’ 2 ’ : case ’ 3 ’ : case ’ 4 ’ : case ’ 5 ’ : case ’ 6 ’ : case ’ 7 ’ : 
case ’ 8 ’ : case ’ 9 ’ : 

if {sscanf {buffer , "7,d" , &in) ^ 1) goto what.say; 
pnnt/ ("Rurmingu°/oduatutimeu7od" , n, ticks. l)\ 
if {bp.h = (tetra) —1 A bp.l = (tetra) —1) printf {"\b."); 
else print/ ( "uwithubreakpointu°/.08xy,08x\n" , bp.h, bp.l); 

MMIX.run {n,bp); continue ; 

case inst.ptr .o = read.hex {buffer + 1); goto new-inst.ptr; 

case ’k’: inst.ptr .o.h (B= *80000000; /* shortcut to kernel mode */ 

if {-sticks .1 A head) headHoc.h ®= *80000000; /* fix the UNSAVE loc */ 

new-inst.ptr: if {inst^ptr .o.h fz* 80000000) g[rK].o.h fz= —2; 

/* disable interrupts on P_BIT */ 
instjptr.p = A; continue; 
case ’b’: bp = read Jiex {buffer + 1); continue; 
case ’v’: verbose = read.hex {buffer + t).l; continue; 

16. (Global variables 5 ) += 

int n, m; /* temporary integer */ 

octa bp ={ — 1 ,— 1}; /* breakpoint */ 

octa tmp; /* an octabyte of temporary interest */ 

static unsigned char d[BUF_SIZE]; 

17. Here’s a simple program to read an octabyte in hexadecimal notation from a 
buffer. It changes the buffer by storing a null character after the input. 

( Subroutines 10 ) += 

octa read.hex ARCS ((char *)); 
octa read.hex{p) 
char *p; 

{ 

register int j, k; 
octa val; 
val.h = val. I = 0; 
for {j = 0; ; /++) { 

if {p[j] > ’0’ Ap[/] < ’9’) d[j] =p[j] - ’O’; 
else if {p[j] > ’a’ Ap[/] < ’f ’ ) d[j] = p[j] - ’a’ + 10; 
else if (p[/j > ’A’ Ap[j] < ’F’) d\j] = p[j] — ’A’ +10; 
else break; 

} 

p[j] = ’\0’; 

for {j — ,k = 0; k < j; k++) { 

if {k > 8) val.h += d[j — k] {4* k — 32); 
else val. I += d[j — k] <C (4 * fc); 

} 

return val; 

} 

18. (Cases for interaction 14 } += 

case if {sscanf {buffer + 1, "7A" , &rn) 1 V n < 0) goto what.s ay; 

if {cool < hot) m = hot — cool; else m = {hot — reorder J>ot) + 1 + {reorder.top — cool); 
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if (n > m) deissues = m\ else deissues = n\ 

continue; 

case ’1’ ; if {sscanf {buffer + 1, "°/,d",&n) 7 ^ 1 V n < 0) goto what.say; 
if (n > Iring^size) goto what.say, 

printf {"uu^ [’/.d] =’/.08x’/,08x\n" ,n,l[n].o.h, l[n].o.l)-, continue; 
case ’m’: imp — mem_read{read_hex {buffer + 1 )); 

print/ ("uuni[’/.s]=’/.08x’/.08x\n" , buffer + 1, tmp.h, tmp.l); continue; 

19. The register stack pointers, rO and rS, are not kept up to date in the g array. 
Therefore we have to deduce their values by examining the pipeline. 

( Cases for interaction 14 ) += 

case ’g’: if {sscanf {buffer + 1, "°/,d",&n) 7 ^ 1 V n < 0) goto what.say; 
if (n > 256) goto what.say; 
if (n = rO V n = rS) { 

if {hot = cool) /* pipeline empty */ 
g[rO].o = sl3 {cooLO), g[rS].o = sl3{cooLS)\ 
else g[rO\.o = sl3 {hot->cur^O) , g[rS\.o = sl3 {hot->cur.S)\ 

} 

print/ ("uug [’/id] =’/.08x’/.08x\n" , n, g[n].o.h, g[n].o.l); 

continue; 

20. (Subroutines 10 ) += 
static octa slS ARGS((octa)); 

static octa sl3{y) /* shift left by 3 bits */ 
octa y\ 

{ 

register tetra yhl = y.h <C 3, ylh = y.l ^ 29; 
y.h — yhl + ylh-, y.l <C= 3; 

return y, 

} 



ARCS = macro (), mmix-pipe§6. 
BUF_SIZE = 100, §5. 
buffer-, char [], §5. 
cool: control *, mmix-pipe §60. 
cooLO: octa, mmix-pipe §98. 
cooLS: octa, mmix-pipe §98. 
cur^O: octa, mmix-pipe §44. 
cur^S: octa, mmix-pipe §44. 
deissues: int, mmix-pipe §60. 
g: int, mmix-pipe §167. 
h: tetra, mmix-pipe §17. 
head: fetch *, mmix-pipe §69. 
hot: control *, mmix-pipe §60. 
inst^ptr: spec, mmix-pipe §284. 



1 : tetra, mmix-pipe §17. 
loc: octa, MMIX-PIPE §44. 
Iring.size: int, mmix-pipe §86. 
mem.read: octa (), 

MMIX-PIPE §210. 

MMIXjrun: void (), 
MMIX-PIPE §10. 
o: octa, MMIX-PIPE §40. 
octa = struct, mmix-pipe §17. 
p: specnode *, mmix-pipe §40. 
P_BIT = 1 < 0, MMIX-PIPE §54. 
printf: int (), <stdio.h>. 
reorder^bot: control *, 



MMIX-PIPE §60. 
reorder.top: control *, 
MMIX-PIPE §60. 
rK = 15, MMIX-PIPE §52. 
rO = 10, MMIX-PIPE §52. 
rS = 11, MMIX-PIPE §52. 
sscanf: int (), <stdio.h>. 
tetra = unsigned int, 
MMIX-PIPE §17. 
ticks: Extern octa, 
MMIX-PIPE §87. 
verbose: int, mmix-pipe §4. 
what^say: label, §13. 
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21. (Cases for interaction 14) += 

case print. cache {buffer [1] = ’T’ ? ITcache : I cache, false)', continue; 

case ’D’: print.cache {buffer [1] = ’T’ ? DTcache : Deache, 
buffer[l] = continue; 

case ’S’: print.cache{S cache, buff er\l\= ’*’); continue; 
case ’p’: print.pipe{)-, print Jocks {); continue; 
case ’s’: print.stats {); continue; 

case ’i’: if {sscanf {buffer + 1, , &in) = 1) g[rI].o = incr{zero.octa,n)', 

continue; 

22. (Cases for interaction 14) += 
case ’f ’ : tmp = read Jiex {buff er + 1); 

{ 

register fetch *newJaiT, 

if {tail = fetch.bot) newJail = fetch Jop; 
else new Jail = tail — 1; 

if {newjail = head) print/ ("Sorry ,utheuf etchubuff eruisufull ! \n" ); 
else { 

tail-loc = inst.ptr.o', 
tail~>inst = tmp.l', 
tail->interrupt = 0; 
tail~>noted = false ; 
tail = newjail', 

} 

continue; 

} 

23. A hidden case here, for me when debugging. It essentially disables the transla- 
tion caches, by mapping everything to zero. 

( Cases for interaction 14 ) -|-= 
case ’d’: if {ticks. 1) 

printf ( " Sorry : uIudisableuIT cacheuanduDTcacheuonlyuatutheubeginning ! \n" ) ; 
else { 

ITcache-*set[0][0].tag = zero.octa-, 

I T caches set[Q\\I)]. data f)] = seven.octa-, 

DTcache-*set[0][0].tag = zero.octa-, 

DTcache-*set[0][0].data[0] = seven.octa-, 

g[rK].o = neg.one; 

page.bad = false-, 

pagejmask = neg.one-, 

inst.ptr.p = (specnode *) 1; 

} continue; 
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24. And another case, for me when kludging. At the moment, it simply lists the 
functional unit names. 

But I might decide to put other stuff here when giving a demo. 

( Cases for interaction 14 ) += 
case ’ ! ’ ; { 

register int j\ 

for {j = 0; j < f unit. count', j++) printf{"unitu7tSu/,d\n",fumt\j].name,funit[j].k); 

} 

continue; 

25. (Global variables 5) += 
bool bad^address ; 
extern bool page.bad; 
extern octa page.mask; 

extern int page.r^ page.s, page.b[b]; 
extern octa zero.octa^ 
extern octa neg^onc] 
octa seven.octa = {0,7}; 

extern octa incr ARCS ((octa y, int delta))\ /* unsigned y + J (J is signed) */ 
extern void mmix.ioJnit ARGS((void)); 
extern void MMIX.config ARGS((char *)); 



ARCS = macro ( ), mmix-pipe §6. 
bool = enum, mmix-pipe §11. 
buffer: char [], §5. 
data: octa *, mmix-pipe §167. 
Dcache: cache *, 

MMIX-PIPE §168. 

DTcache: cache *, 

MMIX-PIPE §168. 
false=0, MMix-PiPE§ll. 
fetch = struct , mmix-pipe §68. 
fetch.bot: fetch *, 

MMIX-PIPE §69. 
fetch.top: fetch *, 

MMIX-PIPE §69. 

funit: func *, mmix-pipe §77. 
funit.count: int, 

MMIX-PIPE §77. 
g: int, mmix-pipe §167. 
head: fetch *, mmix-pipe §69. 

I cache: cache *, 

MMIX-PIPE §168. 
incr: octa (), mmix-ARITH §6. 
inst: tetra, mmix-pipe §68. 
inst^ptr: spec, mmix-pipe §284. 
interrupt: unsigned int. 



MMIX-PIPE §68. 

ITcache: cache *, 

MMIX-PIPE §168. 
k: register int, §17. 

1 : tetra, mmix-pipe §17. 
loc: octa, MMIX-PIPE §44. 
MMIX^config: void (), 
MMIX-CONFIG §38. 
mmix.io.init: void {), 

MMIX-IO §7. 
n: int, §16. 

name: char *, mmix-pipe §167. 
neg^one: octa, mmix-ARITH §4. 
noted: bool, mmix-pipe §68. 
o: octa, MMIX-PIPE §40. 
octa = struct, mmix-pipe §17. 
p: specnode *, mmix-pipe §40. 
page^b: int [], mmix-pipe §238. 
page^bad: bool, 

MMIX-PIPE §238. 
page^mask: octa, 

MMIX-PIPE §238. 
page^r: int, mmix-pipe §238. 
pagers: int, mmix-pipe §238. 
print^cache: void (), 



MMIX-PIPE §176. 
printJocks: void (), 
MMIX-PIPE §39. 
print.pipe: void (), 
MMIX-PIPE §253. 
print.stats: void (), 
MMIX-PIPE §162. 
print/: int (), <stdio.h>. 
read.hex: octa (), §17. 
rl = 12, MMIX-PIPE §52. 
rK = 15, MMIX-PIPE §52. 
Scache: cache *, 

MMIX-PIPE §168. 
set: cacheset *, 

MMIX-PIPE §167. 
specnode = struct , 
MMIX-PIPE §40. 
sscanf: int (), <stdio.h>. 
tag: octa, mmix-pipe §167. 
tail: fetch *, mmix-pipe §69. 
ticks: Extern octa, 
MMIX-PIPE §87. 
tmp: octa, §16. 
zero.octa: octa, 
MMIX-ARITH §4. 
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26. Names of the sections. 

( Cases for interaction 14 , 15, is, 19, 21, 22, 23, 24) Used in section 13 . 

( Change the current location 7 ) Used in section 6. 

( Global variables 5, I6, 25 ) Used in section 2 . 

( Input a rudimentary hexadecimal file 6 ) Used in section 4 . 

( Input an MMIX binary hie 9 ) Used in section 4 . 

( Input consecutive octabytes beginning at cur Joe 11 ) Used in section 9 . 
( Input the program 4) Used in section 2 . 

( Parse the command line 3 ) Used in section 2 . 

( Read an octabyte and advance cur Joe 8 ) Used in section 6. 

( Run the simulation interactively 13 ) Used in section 2 . 

( Set up the canned environment 12 ) Used in section 9. 

( Subroutines 10, 17, 20 ) Used in section 2 . 
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NAMES OF THE SECTIONS 



MMOTYPE 

1. Introduction. This program reads a binary mmo file output by the MMIXAL 
processor and lists it in human-readable form. It lists only the symbol table, if 
invoked with the -s option. It lists also the tetrabytes of input, if invoked with the 
-V option. 

T^include <stdio.h> 

^include <stdlib.h> 

^include <time.h> 

^include <string.h> 

( Prototype preparations 5 } 

( Type definitions 7 } 

( Global variables 4 ) 

( Subroutines 8 ) 
int main{argc, argv) 

int argc\, char *argv\\\ 

{ 

register int j, delta, postamble = 0; 
register char *p; 

( Process the command line 2 ) ; 

(Initialize everything 3 ); 

( List the preamble 23 ) ; 

do (List the next item 13 ) while {-^postamble); 

(List the postamble 24 ); 

( List the symbol table 25 } ; 

return 0; 

} 

2. (Process the command line 2 ) = 
listing — 1, verbose = 0; 

for {j = 1; j < argc - 1 A argv[j][Q] = A argv\j]\2] = ’\0’; j-H-) { 
if [argv[j]\l\ = ’s’) listing — 0; 
else if {argv[j\[l] = ’v’) verbose = 1; 

else break; 

} 

if (i 7 ^ argc - 1) { 

fprintf {stderr , "Usage :u"/iSu [~s] □ l“v] uHunofileXn" , argt;[0]); 
exit {—I)- 

} 

This code is used in section 1. 

3. (Initialize everything 3 } = 
mmo.file = fopen {argv [argc — 1], "rb"); 
if {-^mmo.file) { 

fprintf {stderr , "Can’tuopenuf ileu’/s ! \n", argv[argc — 1]); 
exit{—2)\ 

} 

See also sections 12 and 17. 

This code is used in section 1. 



D.E. Knuth: MMIXware, LNCS 1750, pp. 510—523, 2014. 

DOI: 10.1007/3-540-46611-8_ll © Author and Springer- Verlag Berlin Heidelberg 2014 
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4 . (Global variables 4) = 

int listing- /* are we listing everything? */ 

int verbose-, /* are we also showing the tetras of input as they are read? */ 
FILE *mmo.file-, /* the input file */ 

See also sections 11, 16, and 29. 

This code is used in section 1. 



5 . (Prototype preparations 5) = 
#ifdef __STDC__ 

^define kRGS{list) list 
#else 

^define ARGS(/ist) () 

^endif 

This code is used in section 1. 



6. A complete definition of mmo format appears in the MMIXAL document. Here we 
need to define only the basic constants used for interpretation. 



9(^:define 


mm *98 


9(^:define 


lop.quote *< 


9(^:define 


lop Joe 


*1 


T^tdefine 


lop. skip 


*2 


^define 


lop.fixo 


*8 


^define 


lop.fixr 


#4 


^define 


lop.fixrx 


#5 


9(^:define 


lop.file 


*8 


^define 


lopjine 


*1 


^define 


lop. spec 


*8 


T^tdefine 


lop.pre 


*9 


^define 


lop.post 




T^tdefine 


lop.stab 




T^tdefine 


lop.end 





/* the escape code of mmo format * 
/* the quotation lopcode */ 

/* the location lopcode */ 

/* the skip lopcode */ 

/* the octabyte-fix lopcode */ 

/* the relative- fix lopcode */ 

/* extended relative-fix lopcode 
/* the file name lopcode */ 

/* the file position lopcode */ 

/* the special hook lopcode */ 
/* the preamble lopcode */ 

/* the postamble lopcode */ 

/* the symbol table lopcode */ 
/* the end-it-all lopcode */ 



/ 



STDC , Standard C. 

exit: void (), <stdlib.h>. 



FILE, <stdio.h>. 

fopen: FILE +(), <stdio.h>. 



fprintf: int (), <stdio.h>. 
stderr: FILE =t=, <stdio.h>. 
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7. Low-level arithmetic. This program is intended to work correctly whenever 
an int has at least 32 bits. 

( Type definitions 7 ) = 

typedef unsigned char byte; /* a monobyte */ 
typedef unsigned int tetra; /* a tetrabyte */ 
typedef struct { tetra h, l\ 

} octa; /* an octabyte */ 

This code is used in section 1. 

8. The incr subroutine adds a signed integer to an (unsigned) octabyte. 

( Subroutines 8 ) = 

octa incr ARCS ((octa, int)); 

octa incr{o, delta) 
octa o; 
int delta-, 

{ 

register tetra t; 
octa 

if {delta > 0) { 

t = — delta-, 

if (o.Z < t) x.l = od + delta, x.h = o./i; 

else x.l = o.l — t — 1, x.h = o.h + 1; 

} 

else { 

t = — delta-, 

if {o.l > t) x.l = o.l — t, x.h — o.h-, 

else x.l = o.l + (*ff ff ff ff + delta) + 1, x.h = o.h — 1; 

} 

return x; 

} 

See also sections 9, 10, and 26. 

This code is used in section 1. 
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9. Low-level input. The tetrabytes of an mmo file are stored in friendly big- 
endian fashion, but this program is supposed to work also on computers that are 
little-endian. Therefore we read four successive bytes and pack them into a tetrabyte, 
instead of reading a single tetrabyte. 

( Subroutines 8 ) -|-= 

void read.tet ARCS ((void)); 
void read.tet ( ) 

{ 

if {fread{buf ,1,4:, mmo^file) ^ 4) { 

fprintf {stderr , "Unexpecteduenduofuf ileuafteru’/.dutetras ! \n", count)-, 

exit (—3); 

} 

yz = {buf[2] < 8) -I- buf[3]-, 
tet = (((1)m/[0] < 8 ) -I- buf[l]) < 16) -I- yz-, 
if (verbose) printf ("uu°i 08 x\n" , tet)-, 
count ++; 

} 

10. (Subroutines 8) -|-= 
byte read.byte ARCS ((void)); 
byte read.byte ( ) 

{ 

register byte fe; 

if (-ibyte.count) read^tet))-, 

b — buf [bytc-COunt]-, 

bytc-COunt = (bytc-COunt -|- 1) & 3; 

return fe; 

} 

11. (Global variables 4) += 

int count] /* the number of tetrabytes we’ve read */ 
int hyte.count] /* index of the next-to-be-read byte */ 
byte buf [4] ; /* the most recently read bytes * / 

int yz; /* the two least significant bytes */ 
tetra tet; /* buf bytes packed big-endianwise */ 

12. (Initialize everything 3 ) -h= 
count = byte.count = 0; 



ARCS = macro ( ), §5. 
exit: void (), <stdlib.h>. 
fprintf: int (), <stdio.h>. 



fread: size.t (), <stdio.h>. 
mmo^file: FILE *, §4. 
printf: int (), <stdio.h>. 



stderr: FILE *, <stdio.h>. 
verbose: int, §4. 
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13. The main loop. Now for the bread-and-butter part of this program. 

( List the next item 13 ) = 

{ 

read.tet ( ) ; 

loop: if {buf[0] = mm) 
switch (6 m/[1 ]) { 

case lop.quote: if {yz 7 ^ 1) err ("YZuf ielduofulop_quoteushouldubeul" ); 
readHet ( ) ; break ; 

( Cases for lopcodes in the main loop 18 } 
default: err ("Unknownulopcode" ); 

} 

if (listing) (List tet as a normal item 15 ); 

} 

This code is used in section 1. 

14. We want to catch all cases where the rules of mmo format are not obeyed. The 
err macro ameliorates this somewhat tedious chore. 

T^define err (m) 

{ fprintf(stderr , "Erroruin.utetrau°/,d: u°/i:s!\n" , count, m); continue; } 

15. In a normal situation, the newly read tetrabyte is simply supposed to be loaded 
into the current location. We list not only the current location but also the current 
file position, if cur dine is nonzero and cur Joe belongs to segment 0. 

( List tet as a normal item 15 } = 

{ 

print/ ("’/.08 x’/.08x:u7«08x" , curjoc.h, curjoc.l, tet)-, 
if (-icurJine) printf ("\n"); 
else { 

if (curJoc.h *e0000000) printf ("\n")-, 
else { 

if (cur^file = listed.file) printf ("u(Hneu‘/A)\n" , cur Jine); 
else { 

printf (" u(.\"7>s\" ,ulineu 7 od) \il" , file.name [cur.file], curjine)-, 
listed.file = cur.file; 

} 

} 

curJine -H- ; 

} 

cur Joe = incr(curJoc, 4); curjoc.l &= —4; 

} 

This code is used in section 13. 

16. (Global variables 4) -|-= 

octa cur Joe, /* the current location */ 

int listed.file-, /* the most recently listed file number */ 

int cur. file-, /* the most recently selected file number */ 

int curjine-, /* the current position in cur. file */ 

char */iZe_name [256] ; /* file names seen */ 

octa tmp-, /* an octabyte of temporary interest */ 
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17 . (Initialize everything 3 ) += 
curJoc.h = curJoc.l = 0; 
listed.file = cur.file = — 1; 
curJine = 0; 



buf: byte [], §11. 
count: int, §11. 
fprintf: int (), <stdio.h>. 
h: tetra, §7. 
incr: octa ( ), §8. 



1: tetra, §7. 
listing: int, §4. 
lop^quote ="^0, §6. 
mm = "^98, §6. 

octa = struct, §7. 



printf: int (), <stdio.h>. 
read^tet: void {), §9. 
stderr: FILE *, <stdio.h>. 
tet: tetra, §11. 
yz: int, §11. 



MMOTYPE: THE SIMPLE LOPCODES 
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18. The simple lopcodes. We have already implemented lop-quote, which falls 
through to the normal case after reading an extra tetrabyte. Now let’s consider the 
other lopcodes in turn. 

^define y buf[ 2 ] /* the next-to-least significant byte */ 

^define « buf[ 3 ] /* the least significant byte */ 

( Cases for lopcodes in the main loop 18 ) = 
case lop Joe: if (2 = 2 ) { 

j = y\ read Jet { ); cur Joe. h — {j <C 24 ) + tet; 

} else if (2 = 1 ) cur Joe. h = y <C 24 ; 

else err("Zuf ielduof ulop_loCushouldubeulu°i'u 2 " ); 

readJet{); cur Joe. I = tet ■, 

continue; 

case lop-skip: curJoc = incr{curJoc,yz)\ continue; 

See also sections 19, 20, 21, and 22. 

This code is used in section 13. 

19. Fixups load information out of order, when future references have been resolved. 
The current file name and line number are not considered relevant. 

( Cases for lopcodes in the main loop 18 ) += 
case lop-fixo: if (2 = 2) { 

j = y\ read Jet { ); tmp.h = {j 24 ) + tet\ 

} else if (2 = 1 ) tmp.h = {/ <C 24 ; 

else err("Zuf ielduofulop_f ixoushouldubeuluoru 2 " ); 
read Jet {)\ tmp.l — tet; 

if (listing) print/ ("°/, 08 x°/, 08 x: u°/. 08 x°/, 08 x\n" , tmp.h, tmp.l, curjoc.h, curJoc.l)\ 

continue; 

case lop-fixr: delta = yz\ 
goto fixr; 

case lopjixrx : j — yz; if (j 7^ 16 A / 7^ 24 ) 

err ( " YZuf ielduof ulop_f ixrxushouldijbeul 6 uoru 24 " ) ; 
read Jet ( ) ; 
delta = tet; 

if (delta &*fe000000) err ("incrementuofulop_f ixrxuisutooularge" ); 
fixr: tmp = iner(curJoc, —(delta > *1000000 ? (delta & ) — (1 <C j) : delta) <C 2); 

if (listing) print/ ("°/, 08 x°/, 08 x:u°/. 08 x\n" , tmp.h, tmp.l, delta)-, 

continue; 

20. The space for file names isn’t allocated until we are sure we need it. 

( Cases for lopcodes in the main loop 18 ) += 
case lop.file: if (file.name[y]) { 

for (/ = Z-, j > 0 ; j — ) read Jet ()-, 
cur. file = y; 

if (2) err ( "Twouf ileunamesuwithutheusameunumber " ) ; 

} else { 

if (=2) err ("Nounameugivenuf orunewlyuselecteduf ile" ); 
file.name[y] = (char *) calloc (4 *2 + 1, 1); 
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if {-^file-name[y]) { 

fprintf {stderr , "Nouroomutoustoreutheuf ileuname ! \n" ); exit (— 4); 

} 

cur. file = y, 

for {j = z,p = file.name[y]- j >0; j — ,p += 4) { 
read.tet ( ); 

*p=6u/[0]; *(p+ 1) = 6u/[l]; *{p + 2) = huf[2]\ *{p + 3) = buf[3]-, 

} 

} 

curJine = 0; continue; 

case lopjine: if {cur.file < 0) err ("Nouf ilsuwasuselecteduf orulop_line" ); 
cur .line = yz\ continue; 

21. Special bytes in the file might be in synch with the current location and/or the 
current file position, so we list those parameters too. 

( Cases for lopcodes in the main loop 18 ) += 
case lop.spec. if (listing) { 

pnnt/ ("Specialudatau"/oduatulocu’/.08x’/.08x" , yz, cur.loc.h, cur.loe.l); 
if (-icur.line) printf ("\n"); 

else if (cur.file = listed.file) pnnt/("u(lineu°/.d)\n" , cttr_/ine); 
else { 

print/ ("u (\"’/s\" , ulineu°/.d) \n" , file.name [cur.file], cur.line)\ 

Usted.file = cur.file-, 

} 

} 

while (1) { 
read.tet ( ) ; 
if (buf[0] = mm) { 

if (buf[l] ^ lop.quote \/ yz ^ 1) goto loop; /* end of special data */ 
read.tet ( ); 

} 

if (listing) print/("uuuuuuuuuuuuuuuuuuu’/.08x\n" , tet); 

} 



buf: byte [], §11. 

calloc: void *(), <stdlib.h>. 

cur.file-. int, §16. 

curJine: int, §16. 

curJoc: octa, §16. 

delta: int, §8. 

err = macro ( ), §14. 

exit: void (), <stdlib.h>. 

file.name: char *[], §16. 

fprintf: int (), <stdio.h>. 

h: tetra, §7. 

incr: octa ( ), §8. 



j: register int, §1. 
1: tetra, §7. 

listed.file: int, §16. 
listing: int, §4. 
loop: label, §13. 
lop.file = *6, §6. 
lop.fixo = ^^3, §6. 
lop.fixr = ^^4, §6. 
lop.fixrx = §6. 

lop.line = *7, §6. 
lop.loc = §6. 



lop.quote = §6. 

lop.skip = ^^2, §6. 
lop.spec = §6. 

mm = "^98, §6. 
p: register char *, §1. 
printf: int (), <stdio.h>. 
read.tet: void (), §9. 
stderr: FILE *, <stdio.h>. 
tet: tetra, §11. 
tmp: octa, §16. 
yz: int, §11. 
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22. The other cases shouldn’t appear in the main loop. 

( Cases for lopcodes in the main loop 18 ) += 

case lopjpre-. err ("Can’tuhaveuanotherupreamble" ); 

case lop-post: postamble = 1; 

if {y) err("Yuf ielduofulop_postushouldubeuzero" ); 

if (2 < 32) err ("Zuf ielduofulop_postunmstubeu32uoruinore" ); 

continue; 

case lop^stab: err("Symbolutableuinustuf ollowupostamble" ); 
case lop-end\ err("Symbolutableucan’tuendubeforeuitubegins"); 
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23. The preamble and postamble. Now here’s what we do before and after 
the main loop. 

( List the preamble 23 ) = 

read.tet{)-, /* read the first tetrabyte of input */ 
if {buf[Q] 7 ^ mm V buf[l] yf lopjpre) { 

fprintf (stderr , "InputuisunotuanuMMOuf ileuCf irstutwoubytesuareuwrong) ! \n" ); 
exit{—5)-, 

} 

if (y yf 1) fprintf {stderr , 

"Warning: ul ’mureadinguthisuf ilsuasuver si onul ,unotuversionu°/.d ! \n" , y); 
if (2 > 0) { 
j = Z-, 
readAet ( ) ; 

if (listing) pnnt/ ("Fileuwasucreatedu7«s " , asciime (loca/time((time_t *) &itet))); 
for (j — - j > 0; j — ) { 
readAet ( ); 

if (listing) print/ ( "Preambleudatau’/.08x\n" , tet); 

} 

} 

This code is used in section 1. 

24. (List the postamble 24 } = 
for (j = z\ j < 256; j++) { 

readj,et()\ tmp.h = tet\ readAet()\ 
if (listing) { 

if (tmp.h\/ tet) pnnt/("g°/,03d:u°/.08xy,08x\n" tmp.h, tet); 
else print/ ("g’/,03d: u0\n" , / ); 

} 

} 

This code is used in section 1. 



asctime: char *(), <time.h>. 
buf: byte [], §11. 
err = macro ( ), §14. 
exit: void (), <stdlib.h>. 
fprintf: int (), <stdio.h>. 
h: tetra, §7. 
j: register int, §1. 
listing: int, §4. 



localtime: struct tm *(), 
<time.h>. 
lop^end = §6. 

lop^post = §6. 

lopjpre = "^9, §6. 
lop^stab = §6. 

mm = "^98, §6. 
postamble: register int, §1. 



printf: int (), <stdio.h>. 
read^tet: void {), §9. 
stderr: FILE <stdio.h>. 
tet: tetra, §11. 
tmp: octa, §16. 
y = macro, §18. 

2 : = macro, §18. 
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25. The symbol table. Finally we come to the symbol table, which is the most 
interesting part of this program because it recursively traces an implicit ternary trie 
structure. 

( List the symbol table 25 ) = 
read.tet ( ) ; 

if (buf[0] ^ mm V buf[l] ^ lopstab) { 

fprintf [stderr , "Symbolutableudoesunotufollowutheupost amble ! \n" ); 
exit {—&)■, 

} 

if {yz) fprintf {stderr , "YZuf ielduofulop_stabushouldubeuzero ! \n" ); 
print/ ("Symbolutableu(beginninguatutetrau’/.d) :\n", count)-, 
stab.start = count', 
symjptr = symj)uf-, 
prmEstab ( ) ; 

(Check the lop.end 30 ); 

This code is used in section 1. 

26. The main work is done by a recursive subroutine called printstab, which 
manipulates a global array sym.buf containing the current symbol prefix; the global 
variable symjptr points to the first unfilled character of that array. 

( Subroutines 8 ) += 

void print. stab ARCS ((void)); 
void prinEstab ( ) 

{ 

register int m = read.byte()-, /* the master control byte */ 
register int c; /* the character at the current trie node */ 
register int j, k\ 

if (m&*40) print.stab{); /* traverse the left subtrie, if it is nonempty */ 
if (m&*2f) { 

(Read the character c 27); 

*sym.ptr++ = c; 

if {symjptr = k,symj>uf[symjength.max\) { 

fprintf {stderr , "Oops .utheusymboluisutooulong! \n" ); exit{—7)-, 

} 

if (m & *f) (Print the current symbol with its equivalent and serial number 28 ); 
if (m & *20) print.stab{)-, /* traverse the middle subtrie */ 
symjpti ; 

} 

if (m&*10) print.stab{); /* traverse the right subtrie, if it is nonempty */ 

} 
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27. The present implementation doesn’t support Unicode; characters with more 
than 8-bit codes are printed as However, the changes for 16-bit codes would be 
quite easy if proper fonts for Unicode output were available. In that case, sym.buf 
would be an array of wyde characters. 

( Read the character c 27 ) = 

if (m & *80) j = read.byte{)- /* 16-bit character */ 
else j = 0; 
c = readj)yte { ); 

if (j) c = /* oops, we can’t print {j <C 8) -|- c easily at this time */ 

This code is used in section 26. 

28. ( Print the current symbol with its equivalent and serial number 28 } = 

{ 

*sym.ptr = ’ \0 ’ ; 
j = m & *f ; 

if (j = 15) sprintf {equiv.buf , "$7,03d" , read.byte{))-, 
else if {j < 8) { 

strcpy ( equiv-buf , " # " ) ; 

for ( ; J > 0; j — ) sprintf{equiv.buf + strlen{equiv_buf),"7,02x",read.byte{)); 
if {strcmp{equivj)uf /'tOQOQ'') = Qi) strcpy {equiv.buf /* undefined */ 

} else { 

strncpy {equiv-buf , "#20000000000000" , 33 — 2 * j); 
equiv.buf [33 — 2* j] = ’ \0 ’ ; 

for ( ; J > 8; j — ) sprintf{equiv.buf + strlen{equiv.buf),"7,02x",read.byte{)); 

} 

for {j = k = read.byte { ); ; fc = read.byte{ ),j = {j^7) + k) 

if (fc > 128) break; /* the serial number is now j — 128 */ 
printf ( " UUUu/oSu“u/Sij (’/.d)\n" , sym.buf -\- 1, equiv-buf , j — 128); 

} 

This code is used in section 26. 

29. 9^define symJength.max 1000 
( Global variables 4 ) += 

int stab.start\ /* where the symbol table began */ 
char sym.huf [symdength.rnax]] 

/* the characters on middle transitions to current node */ 
char ^sym.ptv] /* the character in sym.buf following the current prefix */ 
char equiv.buf [20] ; / * equivalent of the current symbol * / 



ARCS = macro (), §5. 
buf: byte [], §11. 
count: int, §11. 
exit: void (), <stdlib.h>. 
fprintf: int (), <stdio.h>. 
lop^end = *c, §6. 



lop.stab = §6. 

mm = "^98, §6. 
printf: int (), <stdio.h>. 
read.byte: byte (), §10. 
read.tet: void (), §9. 
sprintf: int {), <stdio.h>. 



stderr: FILE *, <stdio.h>. 
strcmp: int (), <string.h>. 
strcpy: char *(), <string.h>. 
strlen: size.t (), <string.h>. 
strncpy: char *(), <string.h> 
yz: int, §11. 
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30. ( Check the lop.end 30 ) = 

while {byte-COunt) 

if {read.byte{)) fprintf {stderr , "Nonzeroubyteuf ollowsutheusymbolutable ! \n" ); 
read.tet ( ) ; 

if {buf[0] mm V buf[l] ^ lop.end) 

fprintf {stderr , "Theusymbolutableuisn’ tuf ollowedubyulop_end! \n" ); 
else if {count ^ stab.start + yz + 1) 

fprintf {stderr , "YZuf ielduatulop_endushoulduhaveubeenu°/od ! \n" , count — yz — 1); 
else { 

if {verbose) print/ ("Symbolutableuendsuatutetrau’/.d. \n" , count)-, 
if {f read {buf ,1,1, mmo^file)) 

fprintf {stderr , "Extraubytesuf ollowutheulop_end ! \n" ); 

} 

This code is used in section 25. 
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31. Names of the sections. 

( Cases for lopcodes in the main loop is, i 9 , 20, 21, 22) Used in section 13 . 

( Check the lop-eud 30 ) Used in section 25 . 

( Global variables 4, 11, le, 29) Used in section 1 . 

(Initialize everything 3, 12, 17) Used in section l. 

( List tet as a normal item 15 ) Used in section 13 . 

(List the next item 13) Used in section 1 . 

(List the postamble 24) Used in section 1 . 

(List the preamble 23) Used in section 1. 

(List the symbol table 25) Used in section 1. 

( Print the current symbol with its equivalent and serial number 2S ) Used in sec- 
tion 26 . 

( Process the command line 2 ) Used in section l. 

( Prototype preparations 5 ) Used in section l. 

( Read the character c 27 ) Used in section 26 . 

(Subroutines 8 , 9, 10, 26 ) Used in section 1 . 

( Type definitions 7) Used in section 1 . 



buf: byte [], §11. 
byte^count: int, §11. 
count: int, §11. 
fprintf: int (), <stdio.h>. 
fread: size_t (), <stdio.h>. 



lop^end = §6. 

mm = "^98, §6. 
mmo.file: FILE *, §4. 
printf: int (), <stdio.h>. 
read.byte: byte (), §10. 



read^tet: void (), §9. 
stab.start: int, §29. 
stderr: FILE *, <stdio.h>. 
verbose: int, §4. 
yz: int, §11. 
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MASTER INDEX 

The following list, a compilation of the indexes produced from all the MMIXware 
programs and documentation, shows the section numbers where each identiher makes 
an appearance. Underlined numbers indicate a place of definition. Single-letter 
identifiers are indexed only when they are defined. 

Further characteristics of the program segments, such as ‘system dependencies’, 
can also be found here, together with signihcant error messages and other indexable 
things like the names of people whose work is cited. 

Digits follow letters in the lexicographic order of this index. For example, ‘tl ’ 
follows ‘W; and ‘16ADDU’ precedes ‘2ADDU’. 



?? : MMIX-PIPE 25. 

•/,'/. : MMIX-CONFIG 18. 

STDC : MMIX-ARITH 2, MMIX-IO 2, MMIX- 

PIPE 6, MMIX-SIM 11, MMIXAL 31, 
MMOTYPE 5. 

a: MMIX-ARITH 28, 29, MMIX-PIPE M, 91, 

167 . 381 . 384 . mmix-Sim 61, 114, 117 . 
aa: MMix-CONFiG 16, 23, 31, 32, MMIX- 

PIPE lOT, 177, 181, 186, lOT, 1^, 191, 193, 
196, 199, 205, 233, 234. 
aaaaa-. MMIX-PIPE 237, 243, 244. 
abort : MMIX-IO 8. 

absolute value, floating point: MMIX 13. 

ABSTIME : MMIX-PIPE 89, MMIX-SIM 77. 

acc: MMIX-ARITH 8, 11, 12, 1^, 19, MMIXAL 29, 

83, 92, 93, 94, 95, 96, 107, 109, 126, 127, 131. 

access-time: MMIX-CONFIG 16, 23, MMIX- 

PIPE lOT, 217, 224, 230, 233, 234, 257, 261, 
262, 266, 267, 268, 270, 271, 272, 273, 274, 
288, 291, 292, 295, 296, 300, 326, 353, 354, 
358, 359, 360, 364, 365, 366. 
acctm: MMIX-CONFIG 1^, 15, 23. 

accuracy loss: MMIX-ARITH 31. 

ADD : MMIX 9, MMIX-PIPE MMIX-SIM 54, 

84, MMIXAL 63. 

add: MMIX-CONFIG 28, MMIX-PIPE^, 51, 140. 

add.go: MMIX-PIPE 331 . 

ADDI : MMIX-PIPE £1, MMIX-SIM M, 84. 

addr: MMIX-IO 4, MMIX-MBM 2, 3, MMIX- 

PIPE 40, 43, 44, 73, 89, 95, 100, 115, 116, 
144, 208, 209, 2W, 212, 213, 21&, 219, 
236, 240, 2«, 251, 2M, 256, 257, 259, 260, 
261, 262, 281, 297, 356, 378, MR, 381, 2M, 
MMIX-SIM 20, llA, 117 . 
addr.found: MMIX-PIPE 256 . 

ADDU : MMIX 7, 9, MMIX-PIPE M., MMIX-SIM 54, 

85, MMIXAL 63. 

addu: MMIX-CONFIG 28, MMIX-PIPE 

51, 139. 

ADDUI : MMIX-PIPE £[_, MMIX-SIM M, 85, 131. 

Advanced Micro Devices: MMIX 42. 

after: MMIX-PIPE 282 . 

alf: MMIX-PIPE 192, 193, 195, 205 . 



align.bits: MMIXAL §2, 102, 107. 

alloc.cache: MMIX-CONFIG 35. 

alloc.slot : MMIX-PIPE 204, 205, 218, 222, 225, 

261, 272, 274, 276, 298, 300, 326. 

Alpha computers: MMIX 45, MMIX-PIPE 217. 

alt.name: MMIX-SIM 24 . 

AND : MMIX 10, MMIX-PIPE MMIX-SIM 54, 

86, MMIXAL 63. 

and: MMIX-CONFIG 28, MMIX-PIPE 51, 

138, MMIXAL 97, 101. 

Anderson, Jennifer-Ann Monique: MMIX 40. 

ANDI : MMIX-PIPE M., MMIX-SIM M, 86. 

ANDN : MMIX 10, MMIX-PIPE MMIX-SIM M, 

86, MMIXAL 63. 

andn: MMIX-CONFIG 28, MMIX-PIPE 

51, 138. 

ANDNH : MMIX 13, MMIX-PIPE 47, MMIX-SIM M, 

86, MMIXAL 63. 

ANDNI : MMIX-PIPE 47, MMIX-SIM M, 86. 

ANDNL : MMIX 13, MMIX-PIPE M., MMIX-SIM 54, 

86, MMIXAL 63. 

ANDNMH : MMIX 13, MMIX-PIPE £[_, MMIX- 

SIM 54, 86, MMIXAL 63. 

ANDNML : MMIX 13, MMIX-PIPE £[_, MMIX- 

SIM M, 86, MMIXAL 63. 

Aragon, Cecilia Rodriguez: MMIX-SIM 16. 

arg: MMIX-SIM 143 . 145, 146. 

arg.count: MMIX-PIPE 374 . 380, MMIX- 

SIM IW, 111. 
arg Joe: MMIX-PIPE 380 . 

arge: MMIX-SIM 37, 141 . 142, 163, MMIXAL 136 . 

137, MMMIX 2, 3, MMOTYPE 1, 2, 3. 

ARCS : MMIX-ARITH 2, MMIX-IO 2, MMIX- 

PIPE 6, MMIX-SIM 11, MMIXAL 31, 
MMOTYPE 5. 

argv: MMIX-SIM 141 . 142, 144, MMIXAL 136 . 

137, MMMIX 2, 3, MMOTYPE 1, 2, 3. 
arith.exc: MMIX-PIPE 44, 46, 59, 98, 100, 

146, 307, 308. 

ASCII: MMIX 6. 

asetime: MMOTYPE 23. 

assemble: MMIXAL 52, HI, 128. 

assemble Jnst : MMIXAL 119 . 129, 130, 131. 
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assemble.X : MMIXAL 119 . 124, 125, 126, 127. 

assembly language: MMIXAL 1. 

assoc: MMIX-CONFIG 1^, 15, 23. 

AT&T Bell Laboratories: MMIX 42. 

atomic instruction: MMIX 31. 

Attempt to get characters . . . : MMIX- 

PIPB 381. 

Attempt to put characters . . . : MMIX- 

PIPB 384. 

aux: MMIX-ARITH 8, 9, 11, 12, 13, 14, 19, 24, 

43, 45, MMix-PiPB 20, 21, 343, mmix-Sim 13, 
37, 88, 155, 159, mmixal 27, 28, 101. 
avoid.D: MMIX-PIPE 273, 277 . 

awaken: MMix-PiPB 125 . 222, 224, 245. 

b: MMIX-ARITH 28, 29, M, MMIX-PIPB 44, M, 

1^, lOT, 172, MMIX-SIM 27, W, 91, 160 , 
MMIXAL MMOTYPE IT. 

B_BIT : MMIX-PIPE M, 118, 304, 323, 329, 

330, 332, 336, 337. 
backwardAocal: MMIXAL 90, 91, 109. 

backward JocaLhost : MMIXAL 89, 90, 91. 

Bad object file : MMIX-SIM 26. 

bad^address: MMMIX 9, 11, 25 . 

badjetch: MMIX-PIPE 288, 293, 296, 298, 301 . 

bad^guesses: MMIX-SIM 93, 139 . 140. 

bad^instjnask: MMIX-PIPB 304, 305 . 323. 

bad^resume: MMIX-PIPB 323 . 

bb: MMIX-CONFIG 16, 23, 30, 31, 32, 33, 35, 36, 

37, MMIX-PIPE 167, 170, 172, 179, 185, 193, 
201, 203, 216, 217, 218, 219, 221, 223, 224, 
226, 227, 228, 229, 259, 262, 265, 268, 271, 
273, 275, 276, 280, 292, 294, 364, 378, 379. 
BDIF : MMIX 11, MMIX-PIPB MMIX-SIM M, 

87, MMIXAL 63. 

bdif: MMIX-CONFIG 28, MMIX-PIPB 51, 344. 

BDIFI : MMIX-PIPB £1, MMIX-SIM 87. 

before: MMIX-PIPE 282 . 

Bentley, Jon Louis: MMIXAL 54. 

Berc, Lance Michael: MMIX 40. 

BEV : MMIX 17, MMIX-PIPE MMIX-SIM 54, 

93, MMIXAL 63. 

BEVB : MMIX-PIPB T7_, MMIX-SIM 54, 93. 

big-endian versus little-endian: MMIX 6, 12, 

MMIX-IO 16, MMIX-PIPB 304, MMIXAL 47, 
MMMIX 10. 

bignum: MMIX-ARITH 54, M, 60, 61, 62, 

66, 68, 81, 82. 

biqnum.compare: MMIX-ARITH 54, 61, 64, 

65, 83. 

bignum.dec: MMIX-ARITH 54, 62, 65, 83. 

bignum.double: MMIX-ARITH 83. 

bignum.prec: MMIX-ARITH 59, 62, 65, 83. 

bignum.times.ten: MMIX-ARITH 54, TO, 

64, 65, 82. 

binary files: MMMIX 9. 

binary-to-decimal conversion: MMIX-ARITH 54. 

binary. check: MMIXAL 101 . 

BinaryRead : MMIX-SIM 4, MMIXAL 69. 

BinaryReadWrite : MMIX-SIM 4, MMIXAL 69. 



BinaryWrite : MMIX-SIM 4, MMIXAL 69. 

bit stuffing: MMIX 6, 7, 19. 

bit.code.map : MMIX-PIPE M, 56. 

bits: MMIXAL §2, 64. 

bkpt: MMIX-SIM IT, 58, 63, 82, 83, 161, 162. 

blksz: MMIX-CONFIG 13, 15, 23. 

block.diff: MMIX-PIPE 217_, 219. 

BN : MMIX 17, MMIX-PIPE MMIX-SIM 54, 

93, MMIXAL 63. 

BNB : MMIX-PIPE £1, MMIX-SIM 54, 93. 

BNN : MMIX 17, MMIX-PIPE £7_, MMIX-SIM M, 

93, MMIXAL 63. 

BNNB : MMIX-PIPE £7_, MMIX-SIM 54, 93. 

BNP : MMIX 17, MMIX-PIPE £7_, MMIX-SIM M, 

93, MMIXAL 63. 

BNPB : MMIX-PIPE £7_, MMIX-SIM 54, 93. 

BNZ : MMIX 17, MMIX-PIPE £7_, MMIX-SIM M, 

93, MMIXAL 63. 

BNZB : MMIX-PIPE £1, MMIX-SIM 54, 93. 

BOD : MMIX 17, MMIX-PIPE 47, MMIX-SIM M, 

93, MMIXAL 63. 

BQDB : MMIX-PIPE £1, MMIX-SIM 54, 93. 

bool: MMIX-ARITH 1, 9, 29, 70, MMIX-PIPE 11, 

12, 20, 21, 40, 44, 65, 66, 68, 75, 148, 169, 
170, 175, 176, 202, 203, 238, 242, 303, 315, 
MMIX-SIM 9, 13, 48, 52, 61, 129, 140, 143, 
144, 150, 151, MMIXAL 26. 
booLmult: MMIX-ARITH 29, MMIX-PIPE 21, 

344, MMIX-SIM 13, 87. 

Boolean multiplication: MMIX 12. 

borrow : MMIX-ARITH 62. 

BP : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54, 

93, MMIXAL 63. 
bp: MMMIX 15, IT. 

bp.a: MMIX-CONFIG 15, 37, MMIX-PIPE 150 . 

151, 152, 153. 

bp.amask: MMIX-PIPB 151, 152, 153, 154 . 

bp.b: MMIX-CONFIG 15, 37, MMIX-PIPE 150 . 

151, 152, 153. 

bp.bad.stat: MMIX-PIPE 154 . 155, 162. 

bp.bcmask: MMIX-PIPE 151, 152, 153, 154 . 

bp.c: MMIX-CONFIG 15, 37, MMIX-PIPB 150 . 

153. 

bp.cmask: MMIX-PIPB 151, 152, 153, 154 . 

bp.good.stat: MMIX-PIPB 154 . 155, 162. 

bp.n: MMIX-CONFIG 15, 37, MMIX-PIPE 150 . 

153. 

bp.nmask: MMIX-PIPE 152, 153, 154 . 

bp.npower: MMIX-PIPB 151, 152, 153, 154 . 160. 

bp.ok.stat: MMIX-PIPE 152, 154 . 162. 

bp.rev.stat: MMIX-PIPE 152, 154 . 162. 

bp.table: MMIX-CONFIG 37, MMIX-PIPE 150 , 

151, 152, 160, 162. 

BPB : MMIX-PIPE £1, MMIX-SIM 54, 93. 

br: MMIX-CONFIG 28, MMIX-PIPE 51, 

85, 106, 152, 155. 
break.inst: MMIX-SIM 107 . 

breakpoint: MMIX-PIPE 9, IT, 304, MMIX- 

SIM 63, 82, 83, 93, 107, 109, 127, 
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128, 141, 149. 

breakpoint. hit'. MMIX-PIPE 10, 12, 304. 

BSPEC : MMIXAL 20, 43, §2, 63, 132. 

buf -. MMix-ARiTH 75, 76, 79, mmix-io 4, 12, 

13, 14, 15, 16, 17, 18, 19, 20, mmix-mem 1, 
2, MMIX-PIPE 381 . 384 . MMIX-SIM 1^, 25, 
26, 27, 28, 29, 33, 35, 36, «, 114, 117 . 
MMIXAL 47, MMOTYPE 9, 10, 14_, 13, 18, 
20, 21, 23, 25, 30. 

buf.max: MMIX-ARITH 73, 74, 75. 

buf .pointer : MMIX-CONFIG 9, 10. 

buf.ptr: MMIXAL 33, 34, 102, 136. 

BUF_SIZE : MMIX-CONFIG 9, 10, MMMIX 5, 

6, 13, 16. 

buf. size-. MMIX-SIM 41, 42, 45, 143, 
MMIXAL 32, 34, 84, 137, 139 . 
buffer: MMIX-CONFIG 9, 10, MMIX-IO 12, 

14, 1^, MMIX-SIM 4, 41, 42, 45, 

MMIXAL 32, 34, 38, 41, mmmix 5, 6, 

7, 8, 13, 15, 18, 19, 21, 22. 

bufO: MMIX-ARITH 73, 74, 75, 79. 

bus.words: MMIX-CONFIG 36, 37, MMIX- 

PIPE 214, 216, 219, 223, 297. 
bypass: MMIXAL 45, 102 , 103, 104, 132. 

BYTE : MMIXAL 17, 63, 117. 

byte: MMIX 6. 

byte: MMIX-SIM K), 25, 27, MMOTYPE 7, 

10 , 11 . 

byte.count: MMIX-SIM 24, 25, 27, 

MMOTYPE 10, H, 12, 30. 
byte.diff: MMIX-ARITH 27, 28, MMIX-PIPE 21, 

344, MMIX-SIM 1^, 87. 

BZ : MMIX 17, MMIX-PIPE MMIX-SIM 54, 

93, MMIXAL 63. 

BZB : MMIX-PIPE TJ_, MMIX-SIM M, 93. 

C: MMIX-ARITH 29, MMIX-CONFIG 16, 23, 

MMIX-PIPE 25, 28, 31, 33> M. 23R, 

170 . 172 . 174. 176 , 179 . 181 , 183 . 185, 193, 
196 . 199 . 201 . 203, 205, 215, 217, 222, 224, 
237 . 326 , MMOTYPE 26. 

C preprocessor: MMIXAL 3. 

c.param: MMIX-CONFIG 13 . 

cache: MMIX-CONFIG 16, 23, 31, MMIX- 

PIPE lOT, 168, 169, 170, 171, 172, 173, 
174, 175, 176, 178, 179, 180, 181, 182, 

183, 184, 185, 192, 193, 195, 196, 198, 
199, 200, 201, 202, 203, 204, 205, 215, 
217, 222, 224, 237, 326. 

cache.addr: MMIX-PIPE 192, 193, 196, 201, 

205, 217. 

cache.search: MMIX-PIPE 192 , 193, 195, 205, 

206, 217, 224, 233, 234, 262, 267, 268, 
271, 272, 273, 291, 292, 296, 302, 353, 354, 
365, 366, 367, 378, 379. 

cacheblock: MMIX-CONFIG 32, 33, MMIX- 

PIPE lOT, 169, 170, 171, 172, 178, 179, 

184, 185, 186, 187, 188, 189, 190, 191, 
192, 193, 195, 196, 198, 199, 200, 201, 
202, 203, 204, 205, 217, 222, 224, 232, 



237, 257, 258, 378, 379. 
caches: MMIX 30, MMIX-PIPE 163. 

cacheset: MMIX-CONFIG 32, MMIX-PIPE 167 . 

186, 187, 188, 189, 190, 191, 193, 194, 

196, 205. 

calloc: MMIX-CONFIG 16, 18, 26, 31, 32, 33, 

34, 36, 37, 38, mmix-io 12, mmix-pipb 213, 
MMIX-SIM 17, 24, 35, 41, 42, 77, mmixal 32, 
38, 55, 59, 84, mmotype 20. 
can complement . . . : MMIXAL 100. 

can compute. . . : MMIXAL 101. 

can divide. . . : MMIXAL 101. 

can multiply. . . : MMIXAL 101. 

can negate . . . : MMIXAL 100. 

can registerize. . . : MMIXAL 100. 

can take serial number. . . : MMIXAL 100. 

Can’t allocate. . . : MMIX-CONFIG 16, 18, 31, 

32, 33, 34, 36, 37, mmix-Sim 17, 24, 41. 

Can’t have another. . . : MMOTYPE 22. 

Can’t open. . . : MMIX-CONFIG 38, MMIX- 

SIM 24, MMIXAL 138, MMMIX 6, 9, 
MMOTYPE 3. 

Can’t write. . . : MMIXAL 47. 

cannot add. . . : MMIXAL 99. 

cannot subtract. . . : MMIXAL 101. 

cannot use. . . : MMIXAL 102. 

Capacity exceeded. . . : MMIXAL 38, 55, 59. 

carry : MMIX-ARITH 82. 

carry-save addition: MMIX 40. 

catchint: MMIX-SIM 147, 148 . 

cc: MMIX-CONFIG 16, 23, 31, 32, MMIX- 

PIPB 158, lOT, lOT, 177, 181, 184, 185, 222, 
224, 233, 234, 2M, 245, 357. 
cease : MMIX-PIPE U). 

ch: MMIXAL M, 57, 61, 74, 75, 79. 

Char: MMIX-SIM 39, 40, 41, MMIXAL 32, 

33, 37, 38, 40, 57, 62, 67, 68, 77. 

char.switch: MMIX-SIM 133 . 134. 

check.ld: MMIX-SIM 94, 96. 

check.st: MMIX-SIM 95. 

check.syntax : MMIX-SIM 149 . 

choose.victim: MMix-PiPB 186 . 187 . 196, 205. 

chunk: MMix-CONFiG 37, MMix-PiPB 206 . 209, 

210, 213, 216, 219, 223, 297, mmmix 8, 11. 
chunknode: MMIX-CONFIG 37, MMIX- 

PIPB 206, 207. 

citm: MMIX-CONFIG 13, 15, 23. 

clean.block: MMIX-PIPE rre, 179, 181, 276, 

365, 366, 367. 

clean.co: MMIX-PIPB 2^, 231, 361, 363, 

364, 368. 

clean.ctl: MMIX-PIPE 2^, 231, 361, 368. 

clean.lock: MMIX-PIPB 39, 230, 233, 234, 

361, 368. 

cleanup: MMIX-PIPE 179, 230, 231, 232. 

clearerr: MMIX-IO 13. 

Clock time is... : MMIX-PIPB 14. 

CMP : MMIX 15, MMIX-PIPB MMIX-SIM M, 

90, MMIXAL 63. 
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cmp: MMIX-CONFIG 28, MMIX-PIPE 51, 143. 

cmp.fin: MMIX-PIPB 348 , MMIX-SIM 90 . 

cmp.neg: MMix-PiPB 143 , 348, MMix-SiM 90 . 

cmp.pos: MMIX-PIPB 143 . 348, MMIX-SIM 90 . 

cmp.zero: MMIX-PIPB 143 . 348, MMIX-SIM 90. 

cmp.zero.or.invalid: MMIX-PIPB 348 , MMIX- 

SIM 90. 

CMPI : MMIX-PIPB MMIX-SIM 54, 90. 

CMPU : MMIX 9, 15, MMIX-PIPB MMIX- 

SIM 90, MMIXAL 63. 
cmpu: MMIX-CONFIG 28, MMIX-PIPB 

51, 143. 

CMPUI : MMIX-PIPB 47, MMIX-SIM M, 90. 

CO: MMIX-CONFIG 26, MMIX-PIPB 76, 81, 

82, 2OT, 243, 244. 
code'. MMIXAL 62, 64. 

command line arguments: MMIX-SIM 2, 6, 163. 

commandAuf MMIX-SIM 149, 150, 151 . 
commandAuf.size: MMIX-SIM 150, 151 . 

commit^max: MMIX-CONFIG 15, MMIX- 

PIPB 59, 67, 145, 330. 
compare-and-swap: MMIX 31. 

complement '. MMIXAL 86, 100. 
config file line... : MMIX-CONFIG 10. 

config.file'. MMIX-CONFIG 9, 10, 19, 38. 
config.file.name : MMMIX 2, 3. 

Configuration error... : MMIX-CONFIG 20, 

23, 24, 25, 29, 31, 35, 37. 

Configuration syntax error. . . : MMIX- 

CONFIG 19, 23. 

confusion: MMIX-PIPB 1^, 28, 135, 185, 187. 

constant doesn’t fit... : MMIXAL 117. 

constant.found : MMIXAL 92, 93, 94, 95, 96. 

continuous profiling: MMIX 40. 

control: MMIX-CONFIG 37, MMIX-PIPB 45, 

46, 60, 63, 73, 78, 124, 127, 158, 159, 167, 
230, 235, 248, 254, 255, 285, 357. 
control.struct: MMIX-PIPB 23, M. 

cool: MMIX-PIPB TO, 61, 63, 67, 69, 75, 78, 81, 

82, 84, 85, 86, 98, 99, 100, 102, 103, 104, 
105, 106, 108, 109, 110, 111, 112, 113, 114, 

117, 118, 119, 120, 121, 122, 123, 145, 152, 

158, 160, 227, 308, 309, 312, 314, 316, 322, 

323, 324, 332, 333, 334, 335, 337, 338, 339, 

340, 341, 347, 355, 372, mmmix 18, 19. 
cooLG: MMIX-PIPB 99, 102, 104, 105, 106, 110, 

117, 119, 120, 312, 323, 335, 337. 

cooLhist: MMIX-PIPB 74, 75, 99, 151, 152, 

160, 308, 309, 316. 

cooLL: MMIX-PIPB 99, 102, 104, 105, 106, 110, 

112, 114, 119, 120, 312, 323, 337, 338. 
cooLO: MMIX-PIPB 75, 98, 100, 104, 105, 106, 

110, 112, 114, 117, 118, 119, 120, 145, 147, 
333, 337, 338, 339, mmmix 19. 
cooLS: MMIX-PIPB 75, 98, 100, 110, 113, 114, 

118, 119, 120, 145, 147, 337, mmmix 19. 

copy.block: MMIX-PIPB 184 . 185 . 217, 221. 

copy.inJ.ime: MMIX-CONFIG 16, 23, MMIX- 

PIPB 167, 217, 222, 224, 237, 276. 



copy.outjime: MMIX-CONFIG 16, 23, MMIX- 

PIPB lOT, 203, 221, 233, 234, 259. 
coroutine: MMIX-CONFIG 26, 34, 36, MMIX- 

PIPB 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 
33, 34, 35, 36, 37, 44, 76, 124, 127, 167, 
222, 224, 230, 235, 237, 248, 285. 
coroutine.bit: MMIX-PIPB 8, 10, 125. 

coroutine.struct: MMIX-PIPB 23. 

cotm: MMIX-CONFIG 1^, 15, 23. 

count: MMIX-PIPB 216 . 219 . 223 . mmotypb 9, 

n, 12, 14, 25, 30. 

count.bits : MMIX-ARITH 26, 28, MMIX-PIPB 21, 

344, MMIX-SIM 13, 87. 
counting leading zeros: MMIX 28. 

counting ones: MMIX 12. 

counting trailing zeros: MMIX 37. 

CPV: MMIX-CONFIG 15, 16, 17, 23. 

CPV-size: MMIX-CONFIG 15, 17, 23. 

cpv_spec: MMIX-CONFIG 1^, 15, 17. 

cset: MMIX-CONFIG 28, MMIX-PIPB 51, 345. 

CSEV : MMIX 16, MMIX-PIPB MMIX-SIM 54, 

92, MMIXAL 63. 

CSEVI : MMIX-PIPB MMIX-SIM 54, 92. 

CSN : MMIX 16, MMIX-PIPB 47, MMIX-SIM M, 

92, MMIXAL 63. 

CSNI : MMIX-PIPB MMIX-SIM 54, 92. 

CSNN : MMIX 16, MMIX-PIPB £1, MMIX-SIM 54, 

92, MMIXAL 63. 

CSNNI : MMIX-PIPB T[_, MMIX-SIM 54, 92. 

CSNP : MMIX 16, MMIX-PIPB MMIX-SIM 54, 

92, MMIXAL 63. 

CSNPI : MMIX-PIPB £1, MMIX-SIM 54, 92. 

CSNZ : MMIX 16, MMIX-PIPB MMIX-SIM 54, 

92, MMIXAL 63. 

CSNZI : MMIX-PIPB MMIX-SIM 54, 92. 

CSOD : MMIX 16, MMIX-PIPB MMIX-SIM 54, 

92, MMIXAL 63. 

CSODI : MMIX-PIPB MMIX-SIM 54, 92. 

CSP : MMIX 16, MMIX-PIPB MMIX-SIM M, 

92, MMIXAL 63. 

CSPI : MMIX-PIPB 47, MMIX-SIM 54, 92. 

CSWAP : MMIX 31, 50, MMIX-PIPB 271, 281, 

MMIX-SIM 54, 96, MMIXAL 63. 
cswap: MMIX-PIPB 51, 110, 117, 283, 307. 

CSWAPI : MMIX-PIPB £1, MMIX-SIM 54, 96. 

CSZ : MMIX 16, MMIX-PIPB MMIX-SIM M, 

92, MMIXAL 63. 

CSZI : MMIX-PIPB 47, MMIX-SIM 54, 92. 

ctl: MMIX-CONFIG 16, MMIX-PIPB 23, 30, 31, 

32, 44, 81, 124, 125, 128, 134, 222, 224, 231, 
236, 243, 244, 245, 249, 255, 286. 
ctLchange.bit: MMIX-PIPB 81, 85. 

cur.arg: MMIX-SIM 142, 144 , 163. 

cur.dat: MMMIX 5, 8, 9, 10, 11, 12. 

cur.disp.addr : MMIX-SIM 151 , 152, 156, 

157, 159. 

cur.disp.mode : MMIX-SIM 151, 152, 156, 

157, 159. 

cur.disp.set: MMIX-SIM 151 , 152, 153, 156. 
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curMispAype: MMIX-SIM 151 . 153, 159. 

cur.file-. MMIX-SIM 30, 32, 35, 42, 44, 45, 

47, 49, 51, 53, 63, mmixal 38, 45, 50, 
MMOTYPE 15, 1^, 17, 20, 21. 
cur^greg: MMIXAL 108, 109, 134, 143 . 

curjine: MMIX-SIM 30, 32, 35, 47, 51, 53, 

63, 82, 83, 103, 105, 128, mmotype 15, 
16, 17, 20, 21. 

curjoc: MMIX-SIM 30, 32, 33, 34, 51, 

162 . 165 . MMIXAL 42, 43, 49, 52, 53, 96, 
107, 109, 110, 114, 115, 118, 125, 126, 
130, 131, 132, MMMix 5, 7, 8, 9, 11, 12, 
MMOTYPE 15, 16, 17, 18, 19, 21. 
cur.O: MMIX-PIPB 46, 100, 145, 147, 

MMMIX 19. 

cur.prefix: MMIXAL 56, 61, 87, 111, 132. 

curjround'. MMIX-ARITH 30, 40, 43, 45, 46, 
47, 86, 88, 89, 91, mmix-pipe 20, 346, 
MMIX-SIM 13, 77, 89, 100, 158. 
cur.s-. MMIX-PIPE 46, 100, 145, 147, 
MMMIX 19. 

cur^seg: MMIX-SIM 151 . 152, 161. 

cur.time: MMIX-PIPE 28, 29, 125. 

eyes: MMIX-PIPB 9, 

d: MMIX-ARITH 13, 27, Mi MMIX-PIPE 28i 

97, 170, 197, 201, 203, 220, mmmix M- 

D_BIT : MMIX-ARITH MMIX-PIPE M, 308, 

343, MMIX-SIM W, 88, MMIXAL 69. 
D_Handler : MMIXAL 69. 

Danger : MMIXAL 142. 

dat: MMIX-ARITH 59, 60, 61, 62, 63, 64, 65, 

79, 80, 82, 83, mmix-Sim 16, 20, 50, 51, 
162, 165, MMIXAL 52. 
data: MMIX-CONFIG 31, 32, 33, MMIX- 

PIPE 124, 125, 130, 131, 132, 133, 134, 
135, 137, 138, 139, 140, 141, 142, 143, 
144, 155, 156, 160, 167, 172, 179, 185, 
197, 201, 203, 215, 216, 217, 218, 219, 220, 
222, 223, 224, 225, 226, 232, 233, 234, 237, 
239, 243, 244, 245, 257, 259, 260, 261, 262, 
264, 265, 266, 267, 268, 269, 270, 271, 272, 
273, 274, 275, 276, 277, 278, 279, 280, 281, 
282, 283, 288, 289, 291, 292, 293, 294, 295, 
296, 297, 298, 300, 301, 302, 304, 307, 308, 
309, 310, 313, 325, 326, 327, 328, 329, 330, 
331, 336, 342, 343, 344, 345, 346, 348, 350, 
351, 352, 353, 354, 356, 357, 358, 359, 360, 
361, 363, 364, 365, 366, 367, 368, 369, 370, 
378, 379, MMMIX 12, 23. 

Data_Segment : MMIX-SIM 3, MMIXAL 69. 

Deache: MMix-CONFiG 17, 21, 35, 36, MMIX- 

PIPE 39, 128, 1^, 215, 217, 222, 227, 228, 
233, 234, 257, 259, 261, 262, 263, 265, 267, 
268, 271, 273, 274, 275, 276, 280, 360, 364, 
366, 378, 379, mmmix 21. 

Dclean: MMIX-PIPB 233 . 

DcleanAnc: MMIX-PIPE 233 . 

Dcleanjoop: MMIX-PIPB 233 . 

dd: MMIX-PIPE 197 . 203 . 



dderr: MMIXAL 114. 

Dean, Jeffrey Adgate: MMIX 40. 

dec.pt: MMIX-ARITH 73, 74, TO, 77, 79. 

deegamma: MMIX-PIPE 49. 114, 147, 327. 

decimal: MMIX-SIM 133, 135 . 137. 

decimal-to-binary conversion: MMIX-ARITH 68. 

default.go : MMIX-PIPB 

DEFINED : MMIXAL 74, 78, 87, 91, 109, 115. 

defval: MMIX-GONFIG 12, 13, 14, 16, 17. 
deissues: MMIX-PIPB 60, 61, 63, 64, 67, 145, 

160, 308, 309, 316, mmmix 18. 
del: mmix-pipe 216 . 

delay: mmix-pipe 219 . 

delink: MMIXAL 99, 100. 

delta: MMix-ARiTH 6, 93, 94, mmix-pipe 21 . 

MMIX-SIM 1^, 25, 34, MMIXAL 28, 
mmmix 25, mmotype 1, 8, 19. 
demote.and.fix: MMIX-PIPB 198, 199, 233, 234, 

268, 271, 273, 353, 354, 365, 366, 367. 
demote.usage: MMIX-PIPB 190 . 191 . 199. 

denin: MMIX-PIPB M, 100, 133, 346, 348. 

denin.penalty : MMIX-CONFIG 15, MMIX- 

PIPB 279, 346, 348, 350. 

denout: MMIX-PIPE^, 100, 133, 134, 346. 

denout.penalty : MMIX-CONFIG 15, MMIX- 

PIPB 281, 346, 349. 351. 
derr: MMIXAL 86, 97, 100, 101, 102, 103, 

104, 109, 116, 117, 121, 122, 123, 124, 129. 
dest: MMIXAL 126 . 131 . 

die: MMIX-PIPE M4, 160, 265, 308, 309, 310. 

dig: MMIX-SIM 15 . 

dirty: MMIX-CONFIG 31, 32, 33, MMIX- 

PIPB lOT, 170, 172, 179, 181, 185, 197, 201, 
203, 216, 221, 259, 262. 
dirty. only: MMIX-PIPE 176 . 177. 

dispatch.count : MMIX-PIPE 64, 65, 81. 

dispatch.done: MMIX-PIPB 101 . 112, 113, 

114, 332. 

dispatch.lock: MMIX-PIPE 39, 64, 65, 75, 81, 

85, 310, 329, 330, 356. 
dispatch.max: MMIX-CONFIG 15, 37, MMIX- 

PIPB 59, 74, 75, 85, 162. 
dispatch.stat: MMIX-CONFIG 37, MMIX- 

PIPB 64, 162. 

Ditzel, David Roger: MMIX 42. 

DIV : MMIX 20, 50, MMIX-PIPE MMIX- 

SIM 54, 88, MMIXAL 63. 
div: MMIX-CONFIG 15, 27, 28, MMIX-PIPE 7, 

49, 51, 121, 343. 

DIVI : MMIX-PIPE MMIX-SIM M, 88. 

divide check exception: MMIX 20, 32. 

division by zero : MMIXAL 101. 

DIVU : MMIX 20, MMIX-PIPE MMIX-SIM M, 

88, MMIXAL 63. 

divu: MMIX-CONFIG 28, MMIX-PIPE 51, 

121, 343. 

DIVUI : MMIX-PIPB 47, MMIX-SIM M, 88. 

Dlocker: MMIX-PIPE 127, 128, 276. 

do.resume.trans : MMIX-PIPE 325 . 326. 
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do.syncd: MMIX-PIPB 280, 364 . 369. 

do.syncid'. MMIX-PIPE 280, 364 . 369. 
doingSnterrupt: MMix-PiPE 63, 64, 65, 314, 

317, 318. 

done-. MMix-ARiTH 65, mmix-io 12, 13, 

MMIX-PIPE 125 . 134, 233, 234, mmmix 1^. 
done-with.write: MMIX-PIPE 256 . 

down: MMIX-PIPE 86, 89, 95, 97, 116. 

dpanic: MMIXAL 47, 138, 142. 

DPTco: MMIX-PIPB 235, 236, 237. 

DPTctl: MMIX-PIPE 235 . 236. 

DPTname: MMIX-PIPE 2^, 236. 

DT.hit: MMIX-PIPE 267, 268, 270, 271, 

272, 273. 

DT.miss: MMIX-PIPE 267, 270, 272. 

DT.retry: MMIX-PIPE 272 . 

DTcache: MMIX-CONFIG 17, 21, 35, MMIX- 

PIPB 39, 128, 1^, 236, 237, 266, 267, 268, 
270, 272, 325, 353, 358, mmmix 21, 23. 
dump: MMIX-SIM 164, 165 . 

dump^file: MMix-SiM 144 . 146, 164, 166. 
dump.tet: MMIX-SIM 164, 165 . 166 . 

DUNNQ : MMIX-PIPE 2M, 255, 268, 270, 271, 278. 

dynamic traps: MMIX 35, 37. 

e: MMIX-ARITH 40, 56, 

E_BIT : MMIX-ARITH 93, 94, MMIX-PIPB M, 

56, 306, 314, 317, 351. 
ee: MMIX-ARITH OT, 51, 53. 

ef: MMIX-ARITH OT, 51, 53. 

Emerson, Ralph Waldo: MMIX 7. 

emulate.virt: MMIX-PIPB 272, 310 . 327. 

emulation: MMIX 24, 27, 33, 36, 38, 47, 49, 

MMIX-CONFIG 6. 

end.simulation: MMIX-SIM 141 . 149. 

EOF : MMIXAL 34, 35, MMMIX 10. 

eof: MMIX-IO ]A, 15, 16, 17. 

eps: MMIX-PIPE "My MMIX-SIM 13 . 

equiv: MMIXAL 58, 59, 64, 66, 70, 75, 76, 

78, 87, 94, 98, 99, 100, 101, 104, 108, 

109, 110, 113, 114, 116, 117, 118, 121, 

122, 123, 124, 125, 126, 127, 129, 130, 

131, 132, 134, 144. 
equiv.buf : mmotype 28, 29. 

err: MMIXAL 35, 45, 93, 95, 98, 99, 100, 101, 

106, 108, 109, 117, 118, 121, 122, 123, 124, 
126, 127, 129, 131, 132, mmotype 13, 

14, 18, 19, 20, 22. 
err.buf : MMIXAL 32, 33, 45. 

err^count: MMIXAL 45, 79, 142, 145. 

Error in tetra. . . : MMOTYPE 14. 

errprint^coroutineSd: MMIX-PIPB 24, 25, 28. 

errprintO: MMIX-CONFIG 8, 18, 24, 35, 36, 

37, MMIX-PIPE 1^, 22, 25. 
errprintl : MMIX-CONFIG 8, 10, 16, 19, 20, 23, 

24, 25, 29, 31, 32, 33, 34, 38, mmix-pipe 1^, 
14, 28, 213. 

errprint2: MMIX-CONFIG 8, 20, 23, 31, 32, 33, 

MMIX-PIPB 1^, 14, 25, 210. 
errprintS: MMIX-CONFIG 8, 32. 



es: MMIX-ARITH OT. 

ESPEC : MMIXAL 20, 43, 62, 63, 132. 

et: MMIX-ARITH OT. 

ex: MMIX-ARITH 90. 

exc: MMIX-SIM 60, 61, 84, 85, 87, 88, 89, 90, 

95, 108, 122, 123, 126, 131. 
exceptions: MMIX 32. 

exceptions: MMIX-ARITH 31, 32, 33, 35, 36, 37, 

38, 40, 41, 42, 44, 46, 68, 86, 88, 89, 90, 
91, 93, 94, MMIX-PIPE 20, 281, 346, 350, 
351, MMIX-SIM 13, 89, 95. 
exec-bit: MMIX-SIM 58, 63, 161, 162. 

exit: MMIX-CONFIG 8, 38, MMIX-PIPE 14, 

MMIX-SIM 7, 14, 24, 26, 35, 143, 164, 
MMIXAL 45, 137, 142, MMMIX 3, 6, 7, 8, 9, 
11, 12, MMOTYPE 2, 3, 9, 20, 23, 25, 26. 
exp: MMIX-ARITH 71, 73, 76, 77, 79, 83, 84. 

exp.sign: MMIX-ARITH 77 . 

expanding: MMIXAL 127, 137, 139 . 

expire: MMIX-PIPE 13, 14 . 

Extern: MMIX-PIPE 4, 5, 9, 29, 38, 59, 60, 

66, 69, 77, 86, 87, 98, 115, 136, 150, 161, 
168, 175, 178, 180, 207, 209, 211, 212, 214, 
242, 247, 252, 284, 349. 

Extra bytes follow. . . : MMOTYPE 30. 

/: MMIX-ARITH 37, 

61 . 62 . 82 . MMIX-IO 10, MMIX-PIPB re, 
MMIX-SIM 62. 

F_BIT : MMIX-PIPE 54, 122, 256, 302, 306, 309, 

310, 313, 314, 317, 320, 321, 327. 

FADD : MMIX 22, MMIX-PIPE MMIX-SIM 54, 

89, MMIXAL 63. 

fadd: MMIX-CONFIG 15, 28, MMIX-PIPE 49, 

51, 346. 

fake.stdin: MMIX-SIM 144 . 145. 

false: MMIX-ARITH 1, 24, 68, MMIX-CONFIG 10, 

15, MMIX-PIPE n, 12, 59, 75, 81, 100, 112, 
113, 114, 146, 147, 170, 179, 201, 203, 

205, 221, 244, 259, 269, 301, 304, 314, 
323, 324, 330, 332, 337, 340, 351, 363, 369, 
MMIX-SIM 9, 49, 51, 60, 87, 88, 90, 128, 

131, 133, 141, 143, 149, 150, 152, 153, 161, 
MMIXAL 26, 34, 64, 66, 70, 118, 125, 130, 

132, MMMIX 8, 9, 10, 11, 12, 21, 22, 23. 

Fascicle 1: MMIX 4, MMIX-SIM 165, MMMIX 9. 

Fclose : MMIXAL 69. 

Fclose: MMIX-PIPE 371 . 372, MMIX-SIM M, 

108. 

fclose: MMIX-IO 8, 11, MMIX-SIM 4, 32, 145, 

150, MMMIX 4. 

FCMP : MMIX 23, MMIX-PIPE MMIX-SIM 54, 

90, MMIXAL 63. 

fcmp: MMIX-CONFIG 28, MMIX-PIPB 

51, 348. 

FCMPE : MMIX 25, MMIX-PIPB 348, MMIX- 

SIM 54, 90, MMIXAL 63. 
fcomp: MMIX-ARITH re, MMIX-PIPE 21, 346, 

348, MMIX-SIM re, 89, 90. 
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FDIV : MMIX 22, 50, MMIX-PIPE MMIX- 

SIM 89, MMIXAL 63. 
fdiv: MMIX-CONFIG 15, 28, MMIX-PIPE 49, 

51, 346. 

fdivide: MMIX-ARITH MMIX-PIPE 21, 346, 

MMIX-SIM 1^, 89. 

feof -. MMIX-IO 15, 17, MMIX-SIM 42. 
feps: MMIX-CONFIG 15, 28, MMIX-PIPE 

51, 348. 

fepscomp: MMIX-ARITH 50, MMIX-PIPE 21, 

348, MMIX-SIM 13, 90. 

FEQL : MMIX 23, MMIX-PIPE MMIX-SIM 

90, MMIXAL 63. 

FEQLE : MMIX 25, MMIX-PIPE T7, 348, MMIX- 

SIM 90, MMIXAL 63. 
f error: MMIX-IO 13. 

fetch: MMIX-CONFIG 37, MMIX-PIPE 69, 

70, 73, 74, 301, mmmix 22. 

fetch.bot: MMIX-CONFIG 37, MMIX-PIPE 69, 

73, 74, 75, 301, mmmix 22. 
f etch J)uf. size: MMIX-CONFIG 15, 37. 

fetch.co: MMIX-PIPE 2^, 286, 287. 

fetch.ctl: MMIX-PIPE 2^, 286. 

fetch.hi: MMIX-PIPE 285, 294, 297, 301. 

fetchjo: MMIX-PIPE 2^, 294, 297, 301, 304. 

fetch.max: MMIX-CONFIG 15, MMIX-PIPE 59, 

284, 301. 

fetch.one: MMIX-PIPE 301. 

fetch.ready: MMIX-PIPE 285, 291, 292, 296, 

297, 299, 301. 

fetchjretry : MMIX-PIPE 298, 300. 

fetch.top: MMIX-CONFIG 37, MMIX-PIPE ra, 

71, 73, 74, 75, 301, mmmix 22. 

fetched: MMIX-CONFIG 37, MMIX-PIPE 284 . 

285, 294, 297, 301, 304. 

ff: MMIX-ARITH 63, 64, 65, 79, 80, 83. 

fflush: MMIX-IO 18, 19, 20, MMIX-PIPE 387, 

MMIX-SIM 4, 120, 150, MMMIX 13. 
fgetc: MMIXAL 34, 35, mmmix 10. 

Fgets : MMIX-SIM 4, MMIXAL 69. 

Fgets: MMIX-PIPE 3^, 372, MMIX-SIM 108. 

fgets: MMIX-CONFIG 10, 38, MMix-io 15, 

MMIX-MEM 2, MMIX-PIPE 387, MMIX-SIM 4, 
42, 45, 120, 150, MMIXAL 34, mmmix 6, 13. 
Fgetws : MMIX-SIM 4, MMIXAL 69. 

Fgetws: MMIX-PIPE 371, 372, MMIX-SIM 59, 

108. 

fgetws: MMIX-SIM 4. 

file: MMIX-SIM 4. 

File. . .was modified : MMIX-SIM 44. 

file.info: MMIX-SIM 35, 42, 44, 45, 49. 

filejname: mmotype 15, 1^, 20, 21. 

file^no: MMIX-SIM 1^, 30, 51, 63. 

flle.node: MMIX-SIM 40. 

filename: MMIX-CONFIG 38, MMIXAL 36, 

38, 45, 50, 140. 

filename.count: MMIXAL 37, 38, 140. 

FILENAME_MX : MMIX-IO 2, 8, MMIXAL 38, 

139. 



filename.passed: MMIXAL 50, 51 . 

filLfrom.mem: MMIX-CONFIG 35, MMIX- 

PIPE 129, 222, 224, 237. 
filLfrom.S: MMIX-CONFIG 35, MMIX-PIPE 129, 

224, 237. 

filLfrom.virt: MMIX-CONFIG 35, MMIX- 

PIPE 129, 237, 242. 

fillJock: MMIX-PIPE lOT, 174, 222, 224, 225, 

226, 237, 257, 261, 272, 274, 298, 300. 
filler: MMIX-CONFIG 16, 35, MMIX-PIPE 167 . 

176, 195, 196, 204, 218, 224, 225, 261, 272, 
274, 276, 298, 300, 326. 
filler.ctl: MMIX-CONFIG 16, MMIX-PIPE 167 . 

176, 225, 236, 261, 272, 274, 298, 300, 326. 
fin.bfiot: MMIX-PIPE 346 . 

fimbin: MMIXAL 99, 101. 

fin.ex: MMIX-PIPE 135, M4, 155, 266, 269, 

271, 272, 273, 274, 276, 279, 281, 283, 
296, 298, 300, 301, 313, 325, 326, 327, 
328, 329, 331, 336, 342, 345, 346, 350, 351, 
356, 360, 363, 364, 370. 
fimfloat: MMIX-SIM 89 . 

fin.flot: MMIX-PIPE 346. 

finjd: MMIX-PIPE 279 . MMIX-SIM 94. 

fin.pst: MMIX-SIM 95. 

finest: MMIX-PIPE 281 . MMIX-SIM 95. 

fin.uflot: MMIX-PIPE 346 . 

fimunifloat : MMIX-SIM 89 . 

finish.store: MMIX-PIPE 272, 279, 280 . 

FINT : MMIX 22, 24, 28, MMIX-PIPE 47, 

MMIX-SIM 54, 89, MMIXAL 63. 
fint: MMIX-CONFIG 15, 28, MMIX-PIPE 

51, 346, 347. 

fintegerize: MMIX-ARITH 88, MMIX- 

PIPE 21, 346, MMIX-SIM 13, 89. 
first: MMIX-PIPE 216 . 

FIX : MMIX 27, 28, MMIX-PIPE MMIX- 

SIM 54, 89, MMIXAL 63. 
fix: MMIX-CONFIG 15, 28, MMIX-PIPE 

51, 346, 347. 

fix.o: MMIXAL 58, 112, 118. 

fix.xyz: MMIXAL 114, 130. 

fix.yz: MMIXAL M, 114, 125. 

fixit: MMIX-ARITH MMIX-PIPE 21, 346, 

MMIX-SIM 13, 89. 

fixr: MMIX-SIM 34, MMOTYPE 19. 

FIXU : MMIX 27, 28, MMIX-PIPE MMIX- 

SIM 54, 89, MMIXAL 63. 
flags: MMIX-PIPE 80, 81, 312, 320, 

MMIX-SIM 60, M, 65. 
float-to-fix exception: MMIX 27, 32. 

floating: MMIX-SIM 134, 135, 137. 

floating point arithmetic: MMIX 21. 

floatit: MMIX-ARITH MMIX-PIPE 21, 346, 

MMIX-SIM 13, 89. 

FLOT : MMIX 27, 28, MMIX-PIPE MMIX- 

SIM M, 89, MMIXAL 63. 
flat: MMIX-CONFIG 15, 28, MMIX-PIPE 

51, 346, 347. 



531 



MASTER INDEX 



FLDTI : MMIX-PIPB MMIX-SIM 89. 

FLDTU : MMIX 27, 28, MMix-PiPB MMIX- 

SIM 89, MMIXAL 63. 

FLDTUI : MMIX-PIPB MMIX-SIM M, 89. 

flush.cache-. MMIX-PIPB 202 , 203 , 205, 217, 
233, 234, 263. 

flushjistingjine-. MMIXAL M, 42, 44, 45, 

115, 132, 134, 136. 

flush.to.mem: MMIX-CONFIG 35, MMIX- 

PIPB 129, 215. 

flush.to.S-. MMIX-CONFIG 35, MMIX-PIPB 129 , 
217. 

flusher: MMIX-CONFIG 16, 35, MMIX-PIPB 167, 

176, 202, 203, 204, 205, 215, 217, 221, 

233, 234, 259, 263. 

flusher.ctl-. MMIX-CONFIG 16, MMIX-PIPB 167 . 
fmt_style: MMIX-SIM 135 , 137. 

FMUL : MMIX 22, MMIX-PIPB MMIX-SIM 

89, MMIXAL 63. 

fmul: MMIX-CONFIG 15, 28, MMIX-PIPB 

51, 346. 

fmult: MMIX-ARITH M, MMIX-PIPB 2]_, 346, 

MMIX-SIM 1^, 89. 

Fopen : MMIX-SIM 4, MMIXAL 69. 

Fopen -. MMIX-PIPB 371 , 372, mmix-Sim M, 108. 
fopen: MMIX-CONFIG 38, MMIX-IO 8, MMIX- 

SIM 4, 24, 49, 145, 146, 150, MMIXAL 138, 
MMMIX 6, 9, MMOTYPE 3. 
forced traps: MMIX 35, 36. 

forward.local : MMIXAL 90, 91, 111, 145. 

forward.local.host: MMIXAL 88, 90, 91. 

found: MMIX-SIM 21. 

fp: MMIX-IO 5, 7, 8, 10, 11, 13, 15, 17, 18, 

19, 20, 21, 22, 23. 

fpack: MMIX-ARITH 34, 36, 39, 43, 45, 46, 

47, 49, 84, 87, 89, 92, 94. 
fplus: MMIX-ARITH MMIX-PIPB 21, 346, 

MMIX-SIM 1^, 89. 

fprintf: MMIX-CONFIG 8, MMIX-IO 23, MMIX- 

PIPB 13, 381, 384, MMIX-SIM 14, 24, 26, 35, 
44, 49, 143, 145, 146, MMIXAL 30, 35, 41, 
42, 44, 45, 78, 79, 80, 115, 132, 134, 137, 
142, 145, MMMIX 3, 6, 7, 8, 9, 10, 11, 12, 
MMOTYPE 2, 3, 9, 14, 20, 23, 25, 26, 30. 
fputc: MMIX-SIM 133, 137, 138, 156, 159, 166. 

Fputs : MMIX-SIM 4, MMIXAL 69. 

Fputs: MMIX-PIPB 371 . 372, MMIX-SIM 59, 108. 

fputs : MMIX-SIM 4. 

Fputws : MMIX-SIM 4, MMIXAL 69. 

Fputws: MMIX-PIPB 371 . 372, MMIX-SIM 59 . 

108. 

fputws : MMIX-SIM 4. 

frac: MMIXAL 97, 101. 

frame pointer: MMIXAL 18. 

Fread : MMIX-SIM 4, MMIXAL 69. 

Fread : MMIX-PIPB 371, 372, MMIX-SIM 59. 108. 

fread: MMIX-IO 13, 17, MMIX-PIPB 387, 

MMIX-SIM 4, 26, 120, MMOTYPE 9, 30. 
free: MMIX-IO 12, 13, MMIX-SIM 24. 



freeze.dispatch: MMIX-PIPB 75, 81, 118, 355. 

FREM : MMIX 22, 34, 50, MMIX-PIPB 

MMIX-SIM 54, 89, MMIXAL 63. 
frem: MMIX-CONFIG 27, 28, MMIX-PIPB 49, 

51, 320, 350, 351. 

frem.max: MMIX-CONFIG 15, MMIX-PIPB 349, 

351. 

fremstep: MMIX-ARITH 93, MMIX-PIPB 2]_, 350, 

351, MMIX-SIM 1^, 89. 
freopen: MMIX-SIM 4, 49. 

freq: MMIX-SIM 1^, 50, 51, 63, 130. 

froot: MMIX-ARITH 91, MMIX-PIPB 21, 346, 

MMIX-SIM 1^, 89. 

Fseek : MMIX-SIM 4, MMIXAL 69. 

Fseek: MMIX-PIPB 371 . 372, MMIX-SIM 108. 

fseek: MMIX-IO 21, MMIX-SIM 4, 45. 

FSQRT : MMIX 22, 28, 50, MMIX-PIPB 47, 

MMIX-SIM 54, 89, MMIXAL 63. 
fsqrt: MMIX-CONFIG 15, 28, MMIX-PIPB 7, 

49, 51, 346, 347. 

FSUB : MMIX 22, MMIX-PIPB £7, MMIX-SIM 54, 

89, MMIXAL 63. 

fsub: MMIX-CONFIG 28, MMIX-PIPB 51, 346. 

Ftell : MMIX-SIM 4, MMIXAL 69. 

Ftell: MMIX-PIPB 371, 372, MMIX-SIM 108. 

ftell: MMIX-IO 22, MMIX-SIM 4, 42. 

ftype: MMIX-ARITH 37, 38, 39, 40, 41, 

44, 46, 85, 86, 88, 91, 93. 

FUN : MMIX 23, MMIX-PIPB 348, MMIX- 

SIM 54, 90, MMIXAL 63. 
func: MMIX-CONFIG 18, MMIX-PIPB 75, 

76, 77, 79. 

func_struct: MMIX-PIPB 76 . 

FUNE : MMIX 25, MMIX-PIPB 348, MMIX- 

SIM 54, 90, MMIXAL 63. 
funeq: MMIX-CONFIG 28, MMIX-PIPB 

51, 348. 

funit: MMIX-CONFIG 18, 25, 26, 29, MMIX- 

PIPB 77, 79, 82, MMMIX 24. 
funit.count: MMIX-CONFIG 18, 19, 25, 26, 

MMIX-PIPB TI_, 79, 82, MMMIX 24. 
funpack: MMIX-ARITH 36, 37, 40, 41, 44, 46, 

50, 85, 86, 88, 91, 93. 

future reference cannot... : MMIXAL 109. 

future.bits: MMIXAL 116, 119, 120 . 125, 130. 

fwprintf : MMIXAL 30. 

Fwrite : MMIX-SIM 4, MMIXAL 69. 

Fwrite: MMIX-PIPB 371 . 372, MMIX-SIM 59 . 

108. 

fwrite: MMIX-IO 18, 19, 20, MMIX-SIM 4, 

MMIXAL 47. 

G: MMIX 29, MMIX-SIM 75. 

g: MMIX-ARITH 56, 61, §2, MMIX-PIPB 86 . 

167 . 172 . MMIX-SIM 76. 
gap: MMIX-SIM 47, 48, 53, 128, 143. 

GET : MMIX 43, MMIX-PIPB MMIX-SIM M, 

97, MMIXAL 63, MMMIX 12. 
get: MMIX-CONFIG 28, MMIX-PIPB 49, 51, 

118, 146, 328. 
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getjnt-. MMIX-CONFIG 11, 20, 23, 24. 
get.reader: MMIX-PIPB 1J2, 1J3, 233, 257, 266, 

267, 271, 272, 273, 288, 291, 296, 353, 354, 
358, 359, 360, 365, 366. 
get.token: MMix-CONFiG W, 11, 18, 19, 

22, 23, 25. 

GETA : MMIX 18, MMIX-PIPE MMIX-SIM M, 

85, MMIXAL 63. 

GETAB : MMIX-PIPE MMIX-SIM 54, 85. 

gg\ MMix-ARiTH 63, 64, 65, mmix- 
CONFIG 16, 23, 31, 35, MMIX-PIPE 167 , 
170, 172, 216. 

Ghemawat, Sanjay: MMIX 40. 

Gill, Stanley: MMIX-ARITH 26. 

Gillies, Donald Bruce: MMIX-ARITH 26. 

global registers: MMIX 29. 

GO : MMIX 19, MMIX-PIPB 235, MMIX- 

SIM 54, 107, MMIXAL 63. 
go: MMIX-CONFIG 16, 28, MMIX-PIPB 46, 

49, 51, 85, 100, 119, 120, 122, 123, 128, 
155, 160, 231, 236, 249, 286, 308, 312, 320, 
321, 322, 327, 331, 364. 

GOI : MMIX-PIPB £[_, MMIX-SIM 54, 107. 

good: MMIX-SIM 61, 93, 133, 138. 

good^guesses: MMIX-SIM 93, 139, 140. 

got.DT: MMIX-PIPB 272 . 

got.greg : MMIXAL 108 . 

gotJT: MMIX-PIPB 291, 298. 

got.one: MMIX-PIPE 291, 300, 301. 

gran: MMIX-GONFIG 13, 15, 23. 

graphics: MMIX 11. 

Gray, James Nicholas: MMIX 31. 

GREG : MMIXAL 18, 62, 63, 102, 109, 132. 

greg: MMIXAL 108, 127, 142, 1^, 144. 

greg.val: MMIXAL 108, 127, 133 , 144. 

h: MMIX-ARITH 3, MMIX-IO 3, MMIX- 

PIPE 17_, 1^, 1^, 210, 213 , MMIX-SIM U), 
MMIXAL 26, 68, MMOTYPE 7. 

H_BIT : MMIX-PIPE M, 146, 306, 308, 313, 

314, 317, 319, 320, 321, mmix-Sim M, 

108, 122, 123. 
h.down: MMIX-PIPB 152 . 

h.up: MMIX-PIPE 152 . 

Halt : MMIX-SIM 7, MMIXAL 69. 

Halt: MMIX-PIPE 371, 372, MMIX-SIM 108. 

halted: MMIX-PIPE 10, 12, 356, 373, MMIX- 

SIM 107, 109, 140, 141, 161. 
handle: MMIX-IO 8, 11, 12, 13, 14, 15, 16, 17, 

18, 19, 20, 21, 22, MMIX-SIM 4, 134, 1^, 137. 
handlers: MMIX 32, 35, 38. 

hardware^PT : MMIX-CONFIG 15, 37. 

hash.prime: MMIX-CONFIG 15, 37, MMIX- 

PIPE 207, 209, 210, 213. 
head: MMIX-PIPE M, 71, 73, 74, 75, 80, 81, 

84, 85, 100, 110, 114, 151, 152, 160, 228, 
229, 301, 308, 309, 316, 323, 335, 341, 
MMMIX 12, 15, 22. 

heldj)its: MMIXAL 43, 44, 47, 49, 52. 



Hennessy, John LeRoy: MMIX 1, 3, MMIX- 

PIPB 2, 58, 150, 163. 

Henzinger, Monika Hildegard Rauch: MMIX 40. 

hex: MMIX-SIM 134, 135, 137. 

Hexadecimal file line... : MMMIX 6. 

hexadecimal files: MMMIX 5. 

hi: MMIX-SIM 1^. 

hist: MMIX-PIPB 44, 46, 75, 85, 100, 

160, 308, 309. 
hit: MMIX-PIPE 193 . 

hit.and.miss: MMIX-PIPB 267, 268, 271, 273. 

hit.set: MMIX-PIPE 192, 193, 194, 196, 199, 

201, 217. 

hold.buf: MMIXAL 43, 44, 47, 52. 

hold.op: MMIXAL 98. 

holding.time: MMix-GONFiG 15, MMIX- 

PIPB 2£7, 256, 257. 

hot: MMIX-PIPB 61, 63, 64, 67, 69, 86, 101, 

146, 147, 149, 255, 256, 314, 316, 317, 318, 
319, 320, 321, 357, mmmix 18, 19. 
i: MMIX-ARITH 8, 1^, MMIX-CONFIG 38, 

MMIX-PIPB 10, 12, 44, 172, 176, m, 185 , 
201 . 246, MMIX-SIM 62. 

I can’t allocate... : MMIX-PIPB 213. 

I can’t deal with. . . : MMIXAL 50. 

I can’t open. . . : MMIX-SIM 49. 

I’m reading this file... : MMOTYPE 23. 

I/O: MMIX 33, 37, 44, MMIX-IO 1, MMIX- 

MBM 1, MMIX-SIM 4. 

I_BIT : MMIX-ARITH 31, 40, 41, 42, 44, 

46, 86, 88, 91, 93, mmix-pipb 54, 348, 
MMIX-SIM W, 90, MMIXAL 69. 

I_Handler : MMIXAL 69. 

IBM Corporation: MMIX 31. 

Icache: MMIX-CONFIG 17, 21, 35, 36, 37, 

MMIX-PIPB 39, 128, 1^, 222, 227, 229, 
265, 280, 291, 292, 294, 296, 300, 359, 
364, 365, MMMIX 21. 
lEEE/ANSI Standard 754: mmix 21. 

Ihit.and.miss: MMIX-PIPB 291, 292, 296, 

298, 299. 

ii: MMIX-PIPE 185 , 216 . 

IIADDU : MMIX-PIPE 47, MMIX-SIM M, 85. 

IIADDUI : MMIX-PIPE 47, MMIX-SIM M, 85. 

illegal character constant : MMIXAL 106. 

illegal fraction: MMIXAL 101. 

illegal hexadecimal constant : MMIXAL 95. 

illegal instructions: MMIX 28, 29, 33, 37, 

38, 43, 45, 51. 

illegal.inst: MMIX-PIPE 118 , 347, MMIX-SIM 89, 

97, 99, 100, 102, 104, 107, 124, 125. 
immed.bit: MMIXAL 62, 121, 124. 

immediate operands: MMIX 5, 13. 

implied.loc: MMIX-SIM 51, M, 53. 

Improper hexadecimal. . . : MMMIX 6, 7, 8. 

improper local label... : MMIXAL 103. 

inbuf: MMIX-CONFIG 31, MMIX-PIPE 167 . 200, 

201, 219, 220, 222, 223, 226, 245, 379. 
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incgamma: MMIX-PIPB 113, 147, 323, 

327, 338. 

INCH : MMIX 13, MMIX-PIPE MMIX-SIM 

85, MMIXAL 63. 

INCL : MMIX 13, MMIX-PIPB £[_, MMIX-SIM M, 

85, MMIXAL 63. 
incLfile: MMIX-SIM 150, 151 . 

incLread: MMIX-SIM 150 . 

INCMH : MMIX 13, MMIX-PIPB MMIX-SIM M, 

85, MMIXAL 63. 

INCML : MMIX 13, MMIX-PIPB MMIX-SIM M, 

85, MMIXAL 63. 

incomplete. . .constant : MMIXAL 106. 

incompletestr : MMIX-SIM 149 . 155. 

Incorrect implementation. . . : MMIX- 

PIPB 22, MMIX-SIM 14. 
incr-. MMix-ARiTH 6, 33, 53, 55, 73, 87, 92, 
MMIX-I0 4, 14, 16, 18, 19, 20, mmix-pipe 21. 
46, 64, 84, 85, 100, 113, 114, 119, 120, 236, 
240, 265, 279, 301, 314, 320, 322, 323, 325, 
333, 338, 339, 369, 370, 373, 380, 381, 382, 
383, 384, 385, 386, mmix-Sim 13, 30, 33, 34, 
37, 51, 60, 63, 70, 82, 83, 93, 101, 102, 103, 
104, 105, 106, 108, 109, 115, 116, 118, 119, 
127, 140, 152, 154, 155, 156, 162, 163, 165, 
MMIXAL 28, 47, 52, 94, 95, 107, 126, 131, 
MMMIX 12, 21, MMOTYPE 8, 15, 18, 19. 
increase^L: MMIX-PIPE 110 . 312. 

increment ... too large: MMOTYPE 19. 

incrl: MMIX-PIPE 49, 112, 119, 327. 

inexact exception: MMIX 21, 32. 

Inf : MMIXAL 69. 

inf: MMIX-ARITH 36, 37, 38, 39, 40, 41, 42, 

44, 46, 50, 85, 86, 88, 91, 93. 
inf.octa: MMIX-ARITH 4, 39, 41, 44, 46. 

infinity: MMIX 21. 

info: MMIX-SIM 51, 60, 65, 71, 79, 127, 

130, 131. 

initialization of a user program: MMIX- 

SIM 6, 164. 

inner Jp: MMIXAL 86, 98. 

inner jrp: MMIXAL 97, 98. 

Input is not. . . : MMOTYPE 23. 

input/output: MMIX 33, 37, 44, MMIX-IO 1, 

MMIX-MBM 1, MMIX-SIM 4. 
inst: MMIX-PIPE 73, 75, 84, 100, 110, 114, 

228, 229, 304, 323, 335, 341, mmix-Sim 60, 
61, 63, 70, 108, 123, 130, mmmix 12, 22. 
inst.ptr: MMIX-PIPE 71, 73, 81, 85, 119, 120, 

122, 123, 160, 2M, 288, 290, 294, 301, 

302, 304, 308, 309, 310, 312, 314, 322, 

323, MMIX-SIM 37, 60, 61, 63, 70, 93, 101, 
107, 108, 123, 124, 131, 138, 140, 161, 164, 
MMMIX 12, 15, 22, 23. 

INT_MAX : MMIX-CONFIG 15, 38. 

int.op: MMIX-CONFIG 27, 28. 

int.stages: MMIX-CONFIG 27, 28. 

interact: MMIX-SIM 149. 



interact^after. break: MMIX-SIM 107, 

141, 143. 

interacting: MMIX-SIM 61, 107, 120, 141, 143. 

interactivejielp: MMIX-SIM 144 , 149. 

interactive^read.bit: MMIX-MEM 2, MMIX- 

PIPE 8. 

interim: MMIX-PIPE 46, 81, 100, 112, 113, 

114, 146, 227, 320, 330, 332, 337, 340, 342, 
350, 351, 361, 363, 364, 369. 
internaLop: MMIX-CONFIG 28, MMIX-PIPE 

80, 320. 

internaLop^name : MMIX-PIPE 46, 50 . 

internal_opcode: MMIX-CONFIG 14, 28, 

MMIX-PIPE 44, 49, 51, 246. 
interrupt: MMIX-PIPE 46, 59, 73, 81, 

100, 118, 122, 132, 140, 141, 144, 146, 149, 
160, 256, 266, 269, 271, 281, 282, 288, 301, 
302, 304, 306, 307, 308, 309, 310, 313, 314, 
317, 319, 320, 321, 322, 323, 327, 329, 330, 
331, 332, 336, 337, 343, 346, 348, 350, 351, 
MMIX-SIM 141, 144 . 148, MMMIX 22. 
interrupts: MMIX 33, 34, 35, 36, 37, 38, 

MMIX-PIPE 306, MMIX-SIM 1, 2, 108. 
INTERVAL_TIMEDUT : MMIX-PIPE CT, 314. 

invalid exception: MMIX 21, 32. 

IPTco: MMIX-PIPE 2^, 236, 237. 

IPTctl: MMIX-PIPE 235 . 236. 

IPTname: MMIX-PIPE 235 , 236. 

IS : MMIXAL 16, §2, 63, 109, 132. 

is.dirty: MMIX-PIPB 169 . 170 . 177, 205, 

233, 234. 

isjoad.store: MMIX-PIPE 307, 310, 316, 320. 

is.subnormal: MMIX-PIPB 346 , 348, 350, 351. 

is.trivial: MMIX-PIPE 346 . 350. 

isalpha: MMIXAL 57. 

isdigit: MMIX-ARITH 68, 73, 74, 77, MMIX- 

SIM 152, 155, MMIXAL 38, 57, 86, 94, 103, 
104, 109, 110, 111. 
isletter: MMIXAL 57, 86, 103, 104. 

isspace: MMIX-CONFIG 10, 38, MMIX-SIM 150, 

MMIXAL 38, 103, 104, 106. 
issue.bit: MMIX-PIPB 8, 10, 81, 145, 146, 147, 

149, 283, 310, 314, 319, 320, 321. 
issued.between : MMIX-PIPE 158 , 159 , 160, 

308, 309, 316. 

isxdigit: MMIX-SIM 154, 161, MMIXAL 95. 

IT.hit: MMIX-PIPE 291. 292, 295, 296, 298, 299. 

IT.miss: MMIX-PIPB 291, 295, 298, 299. 

ITcache: MMIX-CONFIG 17, 21, 35, MMIX- 

PIPE 39, 128, 168, 236, 237, 288, 291, 

292, 293, 295, 298, 302, 325, 354, 360, 
MMMIX 12, 21, 23. 

IVADDU : MMIX-PIPE 47, MMIX-SIM 54, 85. 

IVADDUI : MMIX-PIPE MMIX-SIM 54, 85. 

j: MMIX-ARITH 8, 1^, MMIX-CONFIG 23, 

30, 31, 38, MMIX-PIPE 10, 12, M, 162 . 
170 . 172 . 176 . 179 . 181 . 183 . 185 . 189 . 

191 , 203 . MMIX-SIM 15, 50, §2, 162, 1^, 
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MMIXAL M, 74, 136 . MMMIX 17, 

24, MMOTYPB 26. 
jj\ MMIX-PIPE 185 , MMIXAL 
JMP : MMIX 19, MMIX-PIPE 47, MMIX-SIM M, 

70, 107, MMIXAL 63. 
jmp: MMIX-PIPE 51, 84, 85, 327. 

JMPB : MMIX-PIPE MMIX-SIM 54, 70, 107. 

justj,raced\ MMIX-SIM 128, 129 . 
fc: MMIX-ARITH 8, 1^, 29, 91, MMIX- 

CONFIG 31, MMIX-PIPE 76, MMIX-SIM 42, 
45, 47, 62, 82, 143, 160, mmixal 44, 

52, 136 , MMMIX \J_, MMOTYPE 26. 

K_BIT : MMIX-PIPE 54, 118, 322. 

keep-. MMIX-PIPE 202, 203 . 

key. MMIX-PIPE 210 , 213 , MMIX-SIM 20, 21, 22. 

kind-. MMIX- MEM ]_, 2. 

known: MMIX-PIPE 40, 43, 44, 46, 59, 85, 

89, 93, 100, 102, 112, 119, 120, 131, 132, 
133, 135, 144, 237, 244, 255, 265, 290, 

312, 322, 331, 338, 364. 
known.phys: MMIX-PIPE 296 , 298. 

Knuth, Donald Ervin: MMIX-ARITH 58. 

L: MMIX 29, MMIX-SIM 75. 

1: MMIX-ARITH 3, MMIX-CONFIG MMIX- 

lO 3, MMIX-PIPE 17_, 1J7, 189, 191 , 

MMIX-SIM 10, 22, 76, MMIXAL 26, 

52 . 68 . MMOTYPE 7. 

lab.field: MMIXAL 32, 33, 102, 103, 104, 

109, 110, 111. 

label f ield. .. ignored : MMIXAL 102. 

label synteix error... : MMIXAL 103. 

lastji: MMIX-PIPE 209, 210, 211, 213, 216, 

219, 223, 297, mmmix 8, 11. 
last^mem: MMIX-SIM 18, W, 20, 21. 

last.off: MMIX-PIPE 216 . 

last.sym.node : MMIXAL 59, 60 . 

lastJ,rie^node: MMIXAL 55, 

Id: MMIX-CONFIG 27, 28, MMIX-PIPE 51, 

117, 265, 271, 307, 327, 357. 
ld.ready: MMIX-PIPE 2§7_, 268, 270, 271, 273, 

274, 277, 278, 279. 
ld.retry: MMIX-PIPE 272, 273, 274. 

Id^stjaunch: MMIX-PIPE 265 , 266, 354. 

LDA : MMIX 7, MMIXAL 13, 18, 63. 

LDB : MMIX 7, MMIX-PIPE 47, 279, MMIX- 

SIM 94, MMIXAL 63. 

LDBI : MMIX-PIPE £7, MMIX-SIM M, 94. 

LDBU : MMIX 7, MMIX-PIPE £7, 279, MMIX- 

SIM 54, 94, MMIXAL 63. 

LDBUI : MMIX-PIPE £1, MMIX-SIM 54, 94. 

LDHT : MMIX 7, MMIX-PIPE 47, 279, MMIX- 

SIM 54, 94, MMIXAL 63. 

LDHTI : MMIX-PIPE £1, MMIX-SIM 54, 94. 

LDO : MMIX 7, MMIX-PIPE £7, MMIX-SIM M, 

94, MMIXAL 63. 

LDOI : MMIX-PIPE £7, MMIX-SIM M, 94. 

LDOU : MMIX 7, MMIX-PIPE £7, 114, 332, 

MMIX-SIM 54, 94, MMIXAL 63. 

LDOUI : MMIX-PIPE £[, MMIX-SIM 54, 94. 



LDPTE : MMIX-PIPE 235, 236, 279. 

Idpte: MMIX-PIPE 49, 235, 236, 265. 

LDPTP : MMIX-PIPE 235, 236, 279. 

Idptp: MMIX-PIPE 235, 236, 265. 

LDSF : MMIX 26, MMIX-PIPE 47, 271, 279, 

MMIX-SIM 54, 94, MMIXAL 63. 

LDSFI : MMIX-PIPE £7_, MMIX-SIM M, 94. 

LDT : MMIX 7, MMIX-PIPE £1, 279, MMIX- 

SIM M, 94, MMIXAL 63. 

LDTI : MMIX-PIPE £[_, MMIX-SIM 94. 

LDTU : MMIX 7, MMIX-PIPE 47, 279, MMIX- 

SIM M, 94, MMIXAL 63. 

LDTUI : MMIX-PIPE 47, MMIX-SIM M, 94. 

LDUNC : MMIX 30, MMIX-PIPE 47, MMIX-SIM 

94, MMIXAL 63. 

Idunc: MMIX-PIPE 51, 117, 265, 268, 

271, 273, 357. 

LDUNCI : MMIX-PIPE 47, MMIX-SIM M, 94. 

LDVTS : MMIX 46, MMIX-PIPE £[_, MMIX-SIM 

107, MMIXAL 63. 

Idvts: MMIX-PIPE 51, 118, 265, 271, 352. 

LDVTSI : MMIX-PIPE £J_, MMIX-SIM M, 107. 

LDW : MMIX 7, MMIX-PIPE £[_, 279, MMIX- 

SIM M, 94, MMIXAL 63. 

LDWI : MMIX-PIPE MMIX-SIM M, 94. 

LDWU : MMIX 7, MMIX-PIPE 47, 279, MMIX- 

SIM 54, 94, MMIXAL 63. 

LDWUI : MMIX-PIPE £7, MMIX-SIM 94. 

left: MMIX-SIM 16, 21, 22, 50, 162, 165, 

MMIXAL 57, 72, 73, 74. 
left.paren: MMIX-SIM 138, 139 . 

Leung, Shun-Tak Albert: MMIX 40. 

Ig: MMIX-CONFIG 30, 31. 

Ihs: MMIX-SIM 80, 101, 107, 131, 133, 138, 139. 

Urn: MMIX-PIPE 185 . 

line directives: MMIXAL 3. 

line.count: MMIX-SIM 42, 45. 

linejisted: MMIXAL 34, 41, 45, 136. 

line.no: MMIX-SIM 1£, 30, 51, 63, MMIXAL 34, 

36, 38, 45, 50. 

line.shown: MMIX-SIM 45, 48, 51. 

link: MMIXAL 58, 59, 64, 66, 70, 74, 75, 78, 

87, 91, 94, 99, 100, 109, 110, 112, 116, 
118, 125, 130, 132, 145. 

Liptay, John S.: MMIX 30. 

list: MMIX-ARITH 2, MMIX-IO 2, MMIX-PIPE 6, 

MMIX-SIM 11, MMIXAL 31, MMOTYPE 5. 
listed.file: MMOTYPB 15, 16, 17, 21. 

listing: MMOTYPE 2, £ 13, 19, 21, 23, 24. 

listing.bits: MMIXAL 44, 47, 52, 136. 

listing.clear : MMIXAL 47, 52, 136. 

listing.file : MMIXAL 41, 42, 44, 45, 47, 52, 75, 

78, 80, 109, 115, 132, 134, 136, 138, 139. 
listing.loc: MMIXAL 42, 44. 

listingjname: MMIXAL 137, 138, 139 . 

literate programming: MMIXAL 3. 

little-endian versus big-endian: MMIX 6, 12, 

MMIX-IO 16, MMIX-PIPE 304, MMIXAL 47, 
MMMIX 10. 
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ll: MMix-SiM 30, 37, 62, 63, 94, 95, 

96, 103, 105, 111, 114, iiZ, 118, 119, 130, 
157, 159, 161, 163, 164. 
lo: MMIX-SIM 15 . 

load.cache: MMIX-PIPE 200, 201 . 222, 224, 237. 

load.sf: MMIX-ARITH MMIX-PIPE 21, 279, 

MMIX-SIM 1^, 94. 

LOG : MMIXAL 16, 62, 63, 109, 132. 

loc: MMIX-IO 23, MMIX-PIPE 46, 73, 

80, 81, 84, 85, 100, 118, 119, 122, 144, 
149, 151, 152, 160, 236, 266, 271, 296, 
304, 320, 322, 323, 331, 355, 364, 368, 372, 
MMIX-SIM 16, 18, 20, 21, 22, 30, 51, 60, 
61, 63, 70, 101, 109, 127, 130, 162, 163, 
165, MMMix 12, 15, 22. 
loc.implied: MMIX-SIM 

LOCAL : MMIXAL 19, 62, 63, 132. 

local registers: MMIX 29. 

localtime: mmotype 23. 

lock: MMIX-PIPE lOT, 174, 200, 217, 222, 224, 

225, 226, 233, 234, 237, 257, 261, 266, 267, 
271, 272, 273, 274, 276, 288, 291, 296, 300, 
326, 353, 354, 358, 359, 360, 365, 366, 367. 
lockloc: MMIX-PIPE 23, 37, 125, 145, 234, 257, 

279, 287, 301, 360, 361, 364. 
lockvar: MMIX-PIPE 65, 167, 214, 230, 247. 

long^waming.given: MMIXAL 35, 36. 

loop: MMIX-SIM 29, 36, MMOTYPE 13, 21. 

lop.end: MMIX-SIM 23, MMIXAL 23, 24, 80, 

MMOTYPE 6, 22, 30. 

lop.file: MMIX-SIM 23, 35, MMIXAL 23, 24, 

50, MMOTYPE 6, 20. 

lop.fixo: MMIX-SIM 23, 34, MMIXAL 23, 24, 

113, MMOTYPE 6, 19. 

lop.fixr: MMIX-SIM 23, 34, MMIXAL 23, 24, 

114, MMOTYPE 6, 19. 

lop.fixrx: MMIX-SIM 23, 34, MMIXAL 23, 24, 

114, MMOTYPE 6, 19. 

lopjine: MMIX-SIM 23, 35, MMIXAL 23, 24, 

50, MMOTYPE 6, 20. 

lop Joe: MMIX-SIM 23, 33, MMIXAL 23, 24, 

49, MMOTYPE 6, 18. 

lop.post: MMIX-SIM 23, 25, 29, MMIXAL 23, 

24, 144, MMOTYPE 6, 22. 
lop.pre: MMIX-SIM 23, 28, MMIXAL 23, 24, 

141, MMOTYPE 6, 22, 23. 
lop^quote: MMIX-SIM 23, 29, 33, 36, 

MMIXAL 23, 24, 47, MMOTYPE 6, 13, 18, 21. 
lop^quote.command : MMIXAL 47 . 

lop.skip: MMIX-SIM 23, 33, MMIXAL 23, 24, 

49, MMOTYPE 6, 18. 

lop.spec: MMIX-SIM 23, 36, MMIXAL 23, 24, 

132, MMOTYPE 6, 21. 

lop.stab: MMIX-SIM 23, MMIXAL 23, 24, 80, 

MMOTYPE 6, 22, 25. 
lopcodes: MMIXAL 22. 

Ireg: MMIXAL 132, 142, 143 . 

Iringjmask: MMIX-PIPE 89, 104, 105, 106, 

110, 112, 113, 114, 117, 119, 120, 337, 338, 



MMIX-SIM 72, 73, 74, 76, 77, 80, 81, 82, 
83, 101, 102, 104, 157, 159. 

Iring.size: MMIX-CONFIG 15, 37, MMIX- 

PIPE 86, 88, 89, 114, mmix-Sim 72, 76, 

77, 143, MMMIX 18. 

Iru: MMIX-CONFIG 22, MMIX-PIPE 164 . 186, 

187, 189, 191. 
p.: MMIX 50, MMIX-SIM 1. 

m: MMIX-ARITH 27, MMIX-PIPE 12, 187 . 189 . 

191 , 268 . 270 . 271 . 278, 3«, 3M, mmix- 
SIM 114 , 117 . MMIXAL 74, MMMIX 16, 
MMOTYPE 26. 

ma: MMIX-PIPE 372 . 380, MMIX-SIM 61, 108, 

111, 133, 136. 

magic.done: MMIX-PIPE 372 . 

magic^offset: MMIX-ARITH 63 . 

magic^read: MMIX-PIPE 377 . 378 . 380, 381, 

385. 

magic^write: MMIX-PIPE 377 . 379 . 385, 386. 

Main : MMIX-SIM 6, MMIXAL 21, 71. 

main: MMIX-SIM 141 , MMIXAL 136 , MMMIX 2, 

MMOTYPE 1. 

make JtJn finite: MMIX-ARITH 72, 79. 

makejt.zero : MMIX-ARITH 79 . 

makeJd.ready: MMIX-PIPE 271. 

make.map: MMIX-SIM 49. 

makejwojhree : MMIXAL 116 . 

many.arg.bit: MMIXAL 62, 116. 

map: MMIX-SIM 45, 49. 

marginal registers: MMIX 29. 

mask: MMIX-ARITH 13, 18, MMIX-PIPE 282 . 

matrices of bits: MMIX 12. 

max: MMIX-PIPE 268 . 292. 

max.cycs: MMIX-CONFIG 15, 23, 24, 36. 

max.mem.slots : MMIX-CONFIG 15, MMIX- 

PIPE 89. 

max.pipe.op : MMIX-CONFIG 27, MMIX-PIPE 

133, 136. 

max.reaLcommand : MMIX-CONFIG 27, 28, 

MMIX-PIPE 81. 

max.rename-regs: MMIX-CONFIG 15, MMIX- 

PIPE 89. 

max.stage: MMIX-CONFIG 36, MMIX-PIPE 26, 

129. 

max.sys.call: MMIX-PIPE 371 , 372, MMIX- 

SIM 59, 108. 

maxval: MMIX-CONFIG 12, 1^, 20, 23. 

mb: MMIX-PIPE 372 . 380, MMIX-SIM 61, 108, 

111, 133, 136. 

McLellan, Hubert Rae, Jr.: MMIX 42. 

mem: MMIX-PIPE 113, 114, 115, 116, 117, 

227, 236, 246, 249, 254, 255, 265, 333, 

334, 339, 355. 

mem.addrjime: MMIX-CONFIG 15, 36, MMIX- 

PIPE 216, 219, 225, 260, 261, 271, 

274, 277, 297, 300. 

mem.bit: MMIXAL 62, 116, 124. 

mem.bus.bytes: MMIX-CONFIG 1^, 36. 
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mem.chunks : MMIX-CONFIG 37, MMIX- 

PIPE 207, 213. 

mem.chunks^max ■. MMix-CONFiG 15, 37, 
MMix-PiPE 206, 207, 213. 
mem.direct: MMIX-PIPB 257 . 

mem^find: MMIX-SIM 20, 30, 37, 63, 82, 83, 

94, 95, 96, 103, 105, 111, 114, 117, 130, 
157, 159, 161, 163, 164. 
mem.hash-. MMIX-CONFIG 37, MMix-PiPE 207 . 
209, 210, 213, 216, 219, 223, 297, 

MMMIX 8, 11. 

memjock-. MMIX-PIPE 39, 214, 215, 219, 222, 
225, 260, 261, 271, 274, 277, 297, 300. 
memjocker: MMix-PiPB 127, 128, 219, 260, 

271, 277, 297. 

mem_node: MMix-SiM 1^, 17, 19, 20, 21, 

22, 50, 162, 165. 

mem_node_struct: MMIX-SIM 16 . 

mem.read: MMIX-PIPB 208, 209 . 210 . 219, 222, 

271, 277, 297, 378, mmmix 12, 18. 
mem.read.time: MMIX-CONFIG 15, 36, MMIX- 

PIPE 2]A, 219, 222, 223, 271, 277, 297. 
mem.root: MMIX-SIM 18, 19, 21, 53, 161, 164. 

mem.slots: MMIX-PIPE 63, 89, 111, 145, 

147, 256. 

mem.tetra: MMIX-SIM 1^, 20, 62, 82, 83, 

114, 117. 

mem.write: MMIX-PIPB 208, 212 . 213 . 216, 

260, 379, MMMIX 8, 11, 12. 
mem.write.time-. MMIX-CONFIG 15, 36, MMIX- 
PIPE 214, 216, 260. 

mem.x: MMIX-PIPB 44, 46, 100, 111, 113, 117, 

123, 144, 145, 146, 147, 255, 327, 339, 355. 
memory-mapped input/output: MMIX 44, 

MMIX-MEM 1. 

mems: MMIX 50, MMIX-SIM 1. 

mems: MMIX-SIM §4, 127. 

message: MMIXAL 45 . 

Metze, Gernot: MMIX 40. 

mid: MMIXAL 57, 61, 72, 73, 74, 75, 80. 

Miller, Jeffrey Charles Percy: MMIX-ARITH 26. 

minus: MMIXAL 97, 101. 

minus zero: MMIX 21, 22, 23. 

minval: MMIX-CONFIG 12, 1^, 20, 23. 

missing left parenthesis : MMIXAL 98. 

missing right parenthesis : MMIXAL 98. 

mm: MMIX-SIM 23, 28, 29, 36, MMIXAL 22, 47, 

48, MMOTYPB 6, 13, 21, 23, 25, 30. 
mmgetchars: MMIX-IO 4, 8, 18, 19, 20, 

MMIX-PIPE 377 . 381 . MMIX-SIM 114 . 

MMIX binary file. . . : MMMIX 12. 

mmix> : MMIX-SIM 3, 150. 

MMIX.config: MMIX-CONFIG 8, MMIX- 

PIPE 1, 9, 23, 29, 49, 59, 136, 207, 259, 
MMMIX 2, 25. 

mmix.fake.stdin: MMIX-IO W, MMIX-SIM 113 , 

145. 

mmix.fclose: MMIX-IO 11, MMIX-PIPE 372, 

376 . MMIX-SIM 108, 113 . 



mmix.fgets: MMIX-IO 14, MMIX-PIPE 372, 376 . 

MMIX-SIM 108, 113 . 

mmix.fgetws: MMIX-IO 1^, MMIX-PIPE 372, 

376 . MMIX-SIM 108, 113 . 
mmix.fopen: MMIX-IO 8, MMIX-PIPE 372, 376 , 

MMIX-SIM 108, 113 . 

mmix.fputs: MMIX-IO 1^, MMIX-PIPE 372, 376 . 

MMIX-SIM 108, 113 . 

mmix.fputws: MMIX-IO 20, MMIX-PIPB 372, 

376 . MMIX-SIM 108, 113 . 
mmix.fread: MMIX-IO 12, MMIX-PIPE 372, 

376 . MMIX-SIM 108, 113 . 
mmix.fseek: MMIX-IO 21, MMIX-PIPB 372, 376, 

MMIX-SIM 108, 113 . 

mmix.ftell: MMIX-IO 22, MMIX-PIPE 372, 376 . 

MMIX-SIM 108, 113 . 

mmix.fwrite: MMIX-IO 1^, MMIX-PIPE 372, 

376 . MMIX-SIM 108, 113 . 

MMIX.init: MMIX-PIPE 1, 9, 10, MMMIX 2. 

mmix.io.init: MMIX-IO 7, MMIX-SIM 113 . 

141, MMMIX 2, 25. 

mmix.opcode: MMIX-CONFIG 28, MMIX- 

PIPB 44, £1, 75, 156, 157, mmix-Sim 
62, 91. 

MMIX.run: MMIX-CONFIG 28, MMIX-PIPB 1, 

9, W, MMMIX 15. 
mmmix> : MMMIX 13. 

mmo.buf : MMIXAL £]_, 48, 50. 

mmo.byte: MMIXAL 74, 75, 80. 

mmo.clear: MMIXAL £]_, 49, 52. 

mmo.cur.file: MMIXAL 50, 51, 141. 

mmo.cur.loc: MMIXAL 47, 49, 53. 

mmo.err: MMIX-SIM 26, 28, 29, 33, 34, 35. 

mmo.file: MMIX-SIM 24, 25, 26, 32, 

MMOTYPB 3, 4, 9, 30. 
mmo.file.name: MMIX-SIM 24, 142 . 

mmo.line.no: MMIXAL 47, 50, 51 . 

mmo.load: MMIX-SIM 34. 

mmo.loc: MMIXAL 53, 112, 132. 

mmo.lop: MMIXAL 49, 50, 80, 113, 114, 

141, 144. 

mmo.lopp: MMIXAL 49, 50, 80, 114, 132. 

mmo.out: MMIXAL 48, 50. 

mmo.ptr: MMIXAL £7_, 48, 80. 

mmo.sync: MMIXAL 52, 132. 

mmo.tetra: MMIXAL 49, 113, 114, 141, 144. 

mmo.write : MMIXAL 47 . 

mmputchars: MMIX-IO 4, 12, 14, 16, MMIX- 

PIPB 377 . 384 . MMIX-SIM 117_, 163. 
mod: MMIXAL 97, 101. 

mode: MMIX-CONFIG 16, 23, 31, MMIX-IO 5, 

7, 8, 11, 12, 14, 16, 18, 19, 20, 21, 22, 

23, MMIX-PIPE 21, 167, 217, 257, 263, 
MMIX-SIM 4, 13 . 
mode.code: MMIX-IO 8, 9. 

mode. string: MMIX-IO 8, 9, MMIX-SIM 4. 

MOR : MMIX 12, MMIX-PIPB £1, MMIX-SIM M, 

87, MMIXAL 63. 
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mor -. MMIX-CONFIG 15, 28, MMIX-PIPE 
51, 344. 

More ... chunks are needed: MMix-PiPE 213. 

MORI : MMIX-PIPE MMIX-SIM 54, 87. 

MSE: MMIX-SIM 4. 

MUL : MMIX 20, 50, MMIX-PIPE MMIX- 

SIM M, 88, MMIXAL 63. 
mul-. MMIX-CONFIG 27, 28, MMIX-PIPE 
51, 343. 

MULI : MMIX-PIPE MMIX-SIM 54, 88. 

multiprecision conversion: MMIX-ARITH 54, 68. 

multiprecision division: MMIX-ARITH 13. 

multiprecision multiplication: MMIX-ARITH 8. 

MULU : MMIX 20, MMIX-PIPE MMIX-SIM 

88, MMIXAL 63. 

mulu: MMIX-PIPE 51, 121, 343. 

MULUI : MMIX-PIPE £1, MMIX-SIM 88. 

mulO '. MMIX-CONFIG 15, 27, MMIX-PIPE 49. 
343. 

mull : MMIX-CONFIG 15, MMIX-PIPE 343. 

mul2'. MMIX-CONFIG 15, MMIX-PIPE 49 . 

mul3 '. MMIX-CONFIG 15, MMIX-PIPE 49 . 

mull : MMIX-CONFIG 15, MMIX-PIPE 49 . 

mul5 : MMIX-CONFIG 15, MMIX-PIPE 49 . 

muW-. MMIX-CONFIG 15, MMIX-PIPE 49 . 

mull'. MMIX-CONFIG 15, MMIX-PIPE 49 . 

mul8 '. MMIX-CONFIG 15, 27, MMIX-PIPE 49. 

343. 

MUX : MMIX 10, MMIX-PIPE £]_, MMIX-SIM 

87, MMIXAL 63. 

mux: MMIX-CONFIG 15, 28, MMIX-PIPE 

51, 142. 

MUXI : MMIX-PIPE £[_, MMIX-SIM 54, 87. 

MXQR : MMIX 12, MMIX-PIPE £1, MMIX-SIM 

87, MMIXAL 63. 

MXQRI : MMIX-PIPE £7_, MMIX-SIM M, 87. 

my.div: MMIX-PIPE 7. 

my.fsqrt: MMIX-PIPE 7. 

my^random : MMIX-PIPE 7. 

myself: MMIX-SIM 142, 143, 144 . 

n: MMIX-ARITH 1^, MMIX-CONFIG 23, 30, 

38, MMIX-IO 12, 14, 16, 18, 19, 20, 23, 
MMIX-SIM 148 , MMMIX 1^. 

N_BIT : MMIX-PIPE M, 271. 

name: MMIX-CONFIG 12, 13, 14i 16, 18, 20, 

23, 24, 25, 26, 29, 31, 32, 33, 34, 35, 36, 
MMIX-IO 8, MMIX-PIPE 23, 25, 39, 76, 

128, 167, 174, 176, 231, 236, 249, 286, 
MMIX-SIM 4, 35, 44, 49, 51, §£, 130, 

MMIXAL §2, 64, 68, 70, mmmix 24. 
name.buf: MMIX-IO 8. 

NaN: MMIX 21. 

NaN: MMIX-ARITH 68, 70, 73, 84. 

nan: MMIX-ARITH 36, 37, 38, 39, 40, 42, 

50, 85, 86, 88, 91. 

HAND : MMIX 10, MMIX-PIPE £1, MMIX-SIM 

86, MMIXAL 63. 

nand: MMIX-CONFIG 28, MMIX-PIPE 

51, 138. 



NANDI : MMIX-PIPE £J_, MMIX-SIM 54, 86. 

needj): MMIX-PIPE 44, 46, 100, 106, 108, 112, 

113, 114, 131, 312, 345. 
need^ra: MMIX-PIPE 46, 100, 108, 112, 

113, 131, 324. 

NEC : MMIX 9, MMIX-PIPE MMIX-SIM 54, 

85, MMIXAL 63. 

neg.one: MMIX-ARITH 4, 24, MMIX-IO 4, 8, 

11, 12, 14, 15, 16, 17, 19, 20, 21, 22, 
MMIX-PIPE 20, 22, 143, 236, 282, 372, 
MMIX-SIM 13, 14, 53, 77, 90, mmixal 27, 
29, MMMIX 12, 23, 25. 
negate: MMIXAL 82, 86, 100. 

negate.q: MMIX-ARITH 24 . 

negation, floating point: MMIX 13. 

negative locations: MMIX 35, 40, 44. 

NEGI : MMIX-PIPE 47, MMIX-SIM 54, 85, 

MMMIX 12. 

NEGU : MMIX 9, MMIX-PIPE £J_, MMIX-SIM M, 

85, MMIXAL 63. 

NEGUI : MMIX-PIPE £[_, MMIX-SIM 54, 85. 

new^cache: MMIX-CONFIG 1^, 17, 21. 

new^chunk: MMMIX 5, 6, 7, 8, 9, 11. 

new.cool: MMIX-PIPE 75, 78, 101. 

newjetch: MMIX-PIPE 2^, 298, 301, 302. 

new.head: MMIX-PIPE 75, 81, 85, 120. 

new^inst.ptr : MMMIX 15 . 

new.L: MMIX-PIPE 120. 

newjink: MMIXAL 109 , 110, 115. 

new^mem: MMIX-SIM 1T_, 18, 21. 

new.mode: MMIX-SIM 152. 

new.O: MMIX-PIPE 75, 99, 100, 119, 120, 

333, 334, 338, 339. 

new.Q: MMIX-PIPE 146, 148, 149, 310, 

314, 329. 

new.S: MMIX-PIPE 75, 99, 100, 113, 114, 

333, 334, 339. 

new^sym.node: MMIXAL 64, 66, 70, 71, 

87, 111, 118, 125, 130. 
new.tail: MMIX-PIPE 301, MMMIX 22 . 

new.triejnode : MMIXAL 55, 57, 61. 

next: MMIX-PIPE 23, 26, 28, 32, 33, 35, 82, 

125, 134, 145, 176, 183, 196, 202, 205, 

217, 218, 221, 225, 233, 234, 259, 261, 

263, 266, 272, 274, 276, 298, 300, 326, 

350, 361, 363, 364, 368. 

next.char: MMIX-ARITH 68, 69, 71, 72, 73, 77, 

MMIX-SIM 13, 152, 153, 154, 155, 161. 
next.sym^node: MMIXAL 59, 60 . 

next.sync: MMIX-PIPE 364 . 

nextJ.riejnode: MMIXAL 55, 56 . 

next.val: MMIXAL 99, 101. 

NNIX operating system: MMIX 2. 

no base address... : MMIXAL 127. 

No file was selected. . . : mmotype 20. 

No name given. . . : MMOTYPE 20. 

no opcode . . . : MMIXAL 104. 

No room. . . : MMIX-SIM 35, 42, 77, MMIXAL 32, 

84, MMOTYPE 20. 
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no-op: MMIX 49. 

no.const^found : MMIX-ARITH 68 . 

no.hardware^PT : MMIX-CONFIG 37, MMIX- 

PIPE 242, 272, 298. 
noJabeLbit: MMIXAL §2, 102. 

N0NEXISTENT_MEM0RY : MMIX-PIPE M. 

Nonzero byte follows... : MMOTYPE 30. 

noop: MMIX-CONFIG 28, MMIX-PIPE 51, 

80, 118, 122, 322, 323, 327, 332, 337. 
noop.inst: MMIX-PIPE 118 . 227. 

NOR : MMIX 10, MMIX-PIPE 47, MMIX-SIM M, 

86, MMIXAL 63. 

nor: MMIX-CONFIG 28, MMIX-PIPE 51, 138. 

NORI : MMIX-PIPE £7, MMIX-SIM M, 86. 

normal numbers: MMIX 21. 

not a valid prefix : MMIXAL 132. 

notejusage: MMIX-PIPE 188 . 189 . 190, 196. 

noted: MMIX-PIPE 73, 75, 85, 304, 323, 

MMMIX 22. 

null string. . . : MMIXAL 93. 

nullifying: MMIX-PIPE 75, 85, 146, 147, 

310, 315, 316. 

num: MMIX-ARITH 36, 37, 38, 39, 40, 41, 42, 

44, 46, 50, 85, 86, 88, 91, 93. 

NXOR : MMIX-PIPE £7, MMIX-SIM M, 86, 

MMIXAL 63. 

nxor: MMIX-CONFIG 28, MMIX-PIPE 

51, 138. 

NXORI : MMIX-PIPE 47, MMIX-SIM M, 86. 

nybble: MMIX 6, 11. 

O: MMIX-SIM 75. 

o: MMIX-ARITH 29, 34, £7, 88, MMIX- 

lO 12, M, 1^, W, 20, 22, MMIX-PIPE W, 

157 , 246 , MMIX-SIM 12, 15, 91, m, 1^, 154 . 
160 , MMIXAL 114 . 127 . MMOTYPE 8. 
0_BIT : MMIX-ARITH 33, 35, MMIX-PIPE 54, 

MMIX-SIM 5Z, MMIXAL 69. 

0_Handler : MMIXAL 69. 

oand: MMIX-ARITH 25, MMIX-PIPE 21, 241, 

MMIX-SIM W, MMIXAL 28, 107. 
oandn: MMIX-ARITH 2£_, MMIX-PIPE 21, 146, 

240, 241, 279, 325. 
obj.file: MMIXAL 47, 138, 139 . 

obj^file.name : MMIXAL 47, 137, 138, 139. 

obj.time: MMIX-SIM 28, 31, 44. 

object files: MMIXAL 22. 

OCTA : MMIXAL 17, 62, 63, 117, 118. 

octa: MMIX-ARITH 3, 4, 5, 6, 7, 8, 9, 12, 13, 

24, 25, 29, 31, 34, 37, 38, 39, 40, 41, 44, 46, 
47, 50, 54, 56, 69, 85, 86, 87, 88, 89, 91, 93, 
MMIX-GONFIG 31, 32, 33, 37, MMIX-IO 3, 

4, 8, 11, 12, 14, 16, 18, 19, 20, 21, 22, 

23, MMIX-MEM 1, 2, 3, MMIX-PIPE 9, 10, 
17, 18, 19, 20, 21, 40, 44, 46, 68, 87, 90, 
91, 98, 99, 141, 148, 156, 157, 167, 192, 
193, 197, 201, 203, 204, 205, 206, 208, 209, 
210, 212, 213, 216, 219, 220, 237, 238, 239, 
240, 241, 246, 254, 255, 268, 270, 271, 278, 
282, 284, 297, 372, 373, 376, 377, 378, 379, 



380, 381, 384, mmix-Sim 10, 12, 13, 15, 

16, 19, 20, 31, 50, 52, 61, 76, 77, 91, 113, 
114, 117, 137, 140, 151, 154, 160, 162, 165, 
MMIXAL 26, 27, 28, 43, 49, 51, 58, 82, 83, 
114, 126, 127, 131, 133, mmmix 5, 16, 17, 
20, 25, MMOTYPE 7, 8, 16. 
octabyte: MMIX 6. 

odd: MMIX-ARITH 93, 94, 95. 

ODIF : MMIX 11, MMIX-PIPE £1, MMIX-SIM M, 

87, MMIXAL 63. 

odif: MMIX-CONFIG 28, MMIX-PIPE^, 51, 344. 

ODIFI : MMIX-PIPE £7_, MMIX-SIM M, 87. 

odiv: MMIX-ARITH 13, 24, 45, MMIX-PIPE 21, 

343, MMIX-SIM 13, 88, MMIXAL 28, 101. 
off: MMIX-PIPE 185, 210, 213, 21ff, 219, 

223, 226. 

offset: MMIX-IO 21, MMIX-SIM 4, 20, 154 . 

old.hot: MMIX-PIPE 64, 276, 283, 310, 322, 

328, 329, 342, 351, 353, 356, 364. 
old.L: MMIX-SIM 60, 61, 98, 132. 

old.tail: MMIX-PIPE 64, 69, 70, 74, 75, 85, 

160, 308, 309. 

ominus: MMIX-ARITH 5, 12, 24, 47, 53, 73, 88, 

89, 92, 94, 95, mmix-io 4, 12, 18, mmix- 
PIPB 21, 139, 140, 344, MMIX-SIM 1^, 85, 87, 
MMIXAL 28, 49, 100, 101, 114, 126, 127, 131. 
omult: MMIX-ARITH 8, 12, 43, MMIX-PIPE 21, 

343, MMIX-SIM 1^, 88, MMIXAL 28, 101. 
one.arg.bit: MMIXAL 62, 116. 

oo: MMIX-ARITH 34, 47, 49, 53, OT. 

oops: MMIX 50, MMIX-SIM 1. 

oops: MMIX-SIM 6£, 127, MMMIX U). 

Oops... too long: MMOTYPE 26. 

OP: MMIX-CONFIG 1^, 17, 24. 

op: MMIX-PIPE M, 46, 75, 80, 81, 82, 84, 85, 

100, 102, 103, 108, 109, 112, 113, 114, 117, 

124, 139, 151, 152, 155, 156, 157, 236, 256, 

271, 279, 281, 282, 312, 320, 321, 327, 332, 

339, 344, 345, 346, 348, mmix-Sim 60, 62, 

65, 70, 71, 78, 79, 85, 87, 89, 91, 92, 93, 
94, 95, 123, 126, 127, 130, 131. 

OP codes: MMIX 5. 

OP codes, table: MMIX 51. 

op.bits: MMIXAL 102, 104, 105 , 107, 116, 121, 

122, 123, 124, 129. 

op.field: MMIXAL 32, 102, 104, 116, 121, 

122, 123, 124, 129. 
op.info: MMIX-SIM 64, 65. 

op.init.size: MMIXAL 63, 64. 

op^init.table: MMIXAL 64. 

op^ptr: MMIXAL 85, 86, 98, 101. 

op^root: MMIXAL 56, 61, 64, 80, 104. 

OP.size: MMIX-CONFIG 15, 17, 24. 

op_spec: MMIX-CONFIG 1£, 15, 17, 

MMIXAL 62, 63, 64. 

op.stack: MMIXAL 81, 82, 83, 84, 85, 86, 

98, 101. 

opcode: MMIXAL 102, 104, 105 . 109, 117, 118, 

119, 121, 124, 126, 127, 128, 129, 131, 132. 
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opcode syntax error... : MMIXAL 104. 

opcode ... operand(s) : MMIXAL 116. 

opcode.name: MMIX-PIPE 48, 73. 

open: MMIX-ARITH 65 . 

operand of ‘BSPEC’ . . . : MMIXAL 132. 

operand. . .register number: MMIXAL 129. 

operandAist: MMIXAL 32, 85, 86, 106. 

operands. done: MMIXAL 98. 

operating system: MMIX 2, 29, 30, 33, 35, 37, 

38, 43, 44, 47, mmix-pipe 243. 
oplus: MMIX-ARITH 5, 47, 53, 73, MMIX- 

lO 4, MMIX-PIPE 21, 139, 140, 241, 265, 
331, MMIX-SIM 13, 60, 84, 85, 101, 154, 
MMIXAL 28, 94, 99. 

ops: MMIX-CONFIG 18, 25, 29, MMIX-PIPE 76, 

79, 82. 

OR: MMIX 10, MMIX-PIPE 114, MMIX- 

SIM M, 86, MMIXAL 63. 
or: MMIX-CONFIG 28, MMIX-PIPE 51, 114, 

138, MMIXAL 97, 101. 

QRH : MMIX 13, MMIX-PIPE MMIX-SIM 54, 

86, MMIXAL 63, 128 . 

DRI : MMIX-PIPE £7, MMIX-SIM 86, 126, 131. 

origin: MMIX-ARITH 63, 64, 65. 

QRL : MMIX 13, MMIX-PIPE £1, MMIX-SIM 54, 

86, MMIXAL 63, 128 . 

DRMH : MMIX 13, MMIX-PIPE MMIX-SIM M, 

86, MMIXAL 63. 

QRML : MMIX 13, MMIX-PIPE £[, MMIX-SIM 

86, MMIXAL 63. 

DRN : MMIX 10, MMIX-PIPE £1, MMIX-SIM 54, 

86, MMIXAL 63. 

orn : MMIX-CONFIG 28, MMIX-PIPE 51, 138. 

QRNI : MMIX-PIPE £]_, MMIX-SIM 54, 86. 

out.stab: MMIXAL 74, 75, 80. 

outbuf: MMIX-CONFIG 31, MMIX-PIPE 167 . 

176, 202, 203, 205, 215, 216, 217, 218, 
219, 221, 259, 378, 379. 
outer Jp: MMIXAL 85, 98. 

outer.rp: MMIXAL 82, 97, 98. 

over: MMIXAL 97, 101. 

overflow: MMIX 8, 9, 20, 21, 22, 32. 

overflow: MMIX-ARITH 9, 12, 24, mmix- 

pipe 20, 21, 343, MMIX-SIM 1^, 88, 
MMIXAL 2f]_. 

owner: MMIX-PIPE M, 46, 63, 67, 73, 81, 124, 

134, 144, 145, 244, 314, 357. 
oxor: MMIX-ARITH 25. 

oO: MMIX-ARITH 34. 

p: MMIX-ARITH W, 62, 70, 

MMIX-CONFIG 10, 11, MMIX-IO 13, 

14, 1^, MMIX-PIPE 26, 28, 33, 

73, 120, 170, 172, r[9, 1^, m, 189, 
191 . 193 . 196 . 199, m, 203, 205, 2^, 
255 , 256 , 258 , 378 , 379 , 381 , 384 , 387 , 
MMIX-SIM 17, 20, OT, 62, 114, 117, 

120 . 154 . 162 . 165 . mmixal W, 59, 

MMMIX IT, MMOTYPE L 



P_BIT : MMIX-PIPE 54, 81, 149, 160, 322, 

331, MMMIX 15. 

pack.bytes: MMIX-PIPE 320, 335, 341. 

packit: MMIX-ARITH 71, 78, 79. 

page coloring: MMIX-PIPE 268, 292. 

page fault: MMIX 37. 

page table entry: MMIX 45. 

page table pointer: MMIX 45. 

page.b: MMIX-PIPE 2^, 239, 243, 244, 

MMMIX 12, 25. 

page.bad: MMIX-PIPE 238 , 239, 266, 288, 

MMMIX 12, 23, 25. 

page.f: MMIX-PIPE 238 . 239, 272, 298. 

page.mask: MMIX-PIPE 238 , 239, 240, 241, 

279, 325, MMMIX 12, 23, 25. 
page.n: MMIX-PIPE 2^, 239, 240, 279. 

page.r: MMIX-PIPE 238 , 239, 244, MMMIX 12, 

25. 

page.s: MMIX-PIPE 238 , 239, 243, 268, 292, 

MMMIX 12, 25 . 

panic: MMIX-CONFIG 8, 10, 16, 18, 19, 20, 

23, 24, 25, 29, 31, 32, 33, 34, 35, 36, 37, 
38, MMIX-PIPE 13, 22, 28, 135, 185, 187, 
213, MMIX-SIM 14. 17, 24, 41, 42, 77, 120, 
MMIXAL 29, 32, 38, 45, 50, 55, 59, 84. 
PARITY_ERR0R : MMIX-PIPE 57. 

pass.after: MMIX-PIPE 125 . 134, 266, 268, 

270, 271, 288, 350, 353. 
pass. data: MMIX-PIPE 134 . 135. 

passit: MMIX-PIPE 134. 266, 268, 270, 271, 

288, 350, 353, mmix-Sim 161 . 

Patterson, David Andrew: MMIX 1, mmix- 

pipe 2, 58, 150, 163. 

PBEV : MMIX 17, MMIX-PIPE £J_, MMIX-SIM 54, 

93, MMIXAL 63. 

PBEVB : MMIX-PIPE £1, MMIX-SIM 54, 93. 

PBN : MMIX 17, MMIX-PIPE £7, MMIX-SIM M, 

93, MMIXAL 63. 

PBNB : MMIX-PIPE £7, MMIX-SIM 54, 93. 

PBNN : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54, 

93, MMIXAL 63. 

PBNNB : MMIX-PIPE £7, MMIX-SIM 54, 93. 

PBNP : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54, 

93, MMIXAL 63. 

PBNPB : MMIX-PIPE £7, MMIX-SIM 54, 93. 

PBNZ : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54, 

93, MMIXAL 63. 

PBNZB : MMIX-PIPE £7, MMIX-SIM 54, 93. 

PBOD : MMIX 17, MMIX-PIPE 47, MMIX-SIM 54, 

93, MMIXAL 63. 

PBODB : MMIX-PIPE £7_, MMIX-SIM 54, 93. 

PBP : MMIX 17, MMIX-PIPE 47, MMIX-SIM M, 

93, MMIXAL 63. 

PBPB : MMIX-PIPE 47, MMIX-SIM 54, 93. 

pbr: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 

81, 85, 106, 152, 155. 

PBZ : MMIX 17, MMIX-PIPE 47, MMIX-SIM M, 

93, MMIXAL 63. 

PBZB : MMIX-PIPE £7, MMIX-SIM 54, 93. 
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pcs: MMIX-CONFIG 21, 23. 

peek.hist: MMIX-PIPE 68, 74, 75, 85, 99, 

100, 151, 152. 

peekahead: MMix-CONFiG 15, MMix-PiPE 59, 

74. 

performance monitoring: MMIX 40. 

permission bits: MMIX 37, 46. 

phys.addr: MMIX-PIPE 240, 2^, 269, 292, 

295, 298. 

physical addresses: MMIX 44, 45, 47. 

pipe.bit: MMIX-PIPE 8, 10. 

pipejimit: MMIX-CONFIG 24, MMIX-PIPE 136 . 

pipe.seq: MMIX-CONFIG 17, 24, 27, MMIX- 

PIPE 133, 134, 136, 141. 
pixels: MMIX 11. 

plus: MMIXAL 8^, 97, 99. 

policy: MMIX-PIPE 186, 187 . 189 . 191. 

Pool_Segment : MMIX-SIM 3, 6, 37, 163, 

MMIXAL 69. 

POP : MMIX 29, MMIX-PIPE £[_, MMIX-SIM 

101, MMIXAL 63. 

pop: MMIX-CONFIG 28, MMIX-PIPE 46, 

51, 85, 114, 120, 331. 
pop.unsave: MMIX-PIPE 120 , 332. 

population counting: MMIX 12. 

ports: MMIX-CONFIG 16, 23, 34, MMIX- 

PIPE 128, lOT, 183. 

postamble: MMIX-SIM 25, 29, 32, mmotype 1, 

22 . 

power-saver mode: MMIX 31. 

P0WER_FAILURE : MMIX-PIPE 57 . 

power. of.two : MMIX-CONFIG 12, 1^, 20, 23. 

pp: MMIX-ARITH 61, 80, MMIX-PIPE 184, 

185 . MMIXAL 59, 64, 66, 70, 74, 78, 87, 

104, 109, 110, 111, 112, 118, 125, 130. 
ppol: MMIX-CONFIG 22, 23. 

PR_BIT : MMIX-PIPE 266, 269. 

prec: MMIXAL 83. 

precedence: MMIXAL 85. 

predef.size: MMIXAL 69, 70. 

predef.spec: MMIXAL 69, 70. 

PREDEFINED : MMIXAL 64, 66, 70, 87, 109. 

predefined symbols: MMIXAL 10, 67, 69. 

predefs: MMIXAL 69, 70. 

predicted : MMIX-PIPE 151. 

PREFIX : MMIXAL 16, 62, 63, 129, 132. 

PREGO : MMIX 30, MMIX-PIPE TJ_, 235, MMIX- 

SIM M, 106, MMIXAL 63. 
prego: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 

81, 227, 265, 271, 288, 289, 294, 296, 

298, 300, 301. 

PREGOI : MMIX-PIPE MMIX-SIM M, 106. 

PRELD : MMIX 30, MMIX-PIPE MMIX-SIM M, 

106, MMIXAL 63. 

preld: MMIX-PIPE 51, 81, 227, 265, 266, 

269, 271, 272, 273, 274. 

PRELDI : MMIX-PIPE T[_, MMIX-SIM M, 106. 

Premature end of file. . . : MMMIX 10. 



PREST : MMIX 30, 34, MMIX-PIPE MMIX- 

SIM M, 106, MMIXAL 63. 
prest: MMIX-PIPE 51, 81, 227, 265, 269, 

271, 272, 273, 274, 275. 
prest.span: MMIX-PIPE 275, 276 . 

prest.win: MMIX-PIPE 267 , 276. 

PRESTI : MMIX-PIPE MMIX-SIM M, 106. 

printjoits: MMIX-PIPE 46, K, 73. 

print.cache: MMIX-PIPE 175 . 176 . MMMIX 21. 

print.cache.block : MMIX-PIPE 171 . 172 . 177. 

print.cacheJ.ocks: MMIX-PIPE 39, 173 , 174 . 

print.control.block : MMIX-PIPE 63, 

81, 125, 145, 146, 147. 

print.coroutine.id: MMIX-PIPE 24, 25, 28, 33, 

63, 73, 81, 125, 145. 

print.fetch.buffer: MMIX-PIPE 72, 73, 253. 

print.float: MMIX-ARITH 54, 59, MMIX-SIM 1^, 

137, 159. 

print.freqs: MMIX-SIM M, 53. 

print.hex: MMIX-SIM V2, 137, 138, 159. 

print.int: MMIX-SIM 1^, 137, 159. 

print.line: MMIX-SIM 47. 

print.locks: MMIX-PIPE 10, 38, MMMIX 21. 

print.octa: MMIX-PIPE 1^, W, 43, 46, 73, 91, 

146, 149, 152, 160, 176, 251, 283, 310, 
314, 319, 320, 321. 

print.pipe: MMIX-PIPE 10, 252, 253 . MMMIX 21. 

print. reorder. buffer: MMIX-PIPE 62, 253. 

print.spec: MMIX-PIPE 46. 

print.specnode : MMIX-PIPE 43, 46. 

print.specnode.id: MMIX-PIPE 43, 73, 90, 91 . 

print.stab: MMOTYPE 25, 26 . 

print.stats: MMIX-PIPE 161 . 162 . MMMIX 2, 21. 

print.string : MMIX-SIM 159, 160 . 

print.trip.warning: MMIX-IO 23, MMIX- 

PIPE 373, 376, MMIX-SIM 109, 113 . 
print.write.buffer: MMIX-PIPE 250 . 251 . 253. 

printf: MMIX-ARITH 54, 55, 57, 67, MMIX- 

MBM 2, 3, MMIX-PIPE 10, 19, 25, 28, 33, 
39, 43, 46, 56, 63, 73, 81, 91, 125, 145, 
146, 147, 149, 152, 160, 162, 172, 174, 176, 
177, 251, 283, 310, 314, 319, 320, 321, 387, 
MMIX-SIM 12, 15, 45, 47, 49, 51, 53, 82, 
83, 103, 105, 120, 128, 130, 131, 132, 133, 
137, 138, 140, 143, 149, 150, 159, 160, 162, 
MMMIX 2, 13, 14, 15, 18, 19, 22, 23, 24, 
MMOTYPE 9, 15, 19, 21, 23, 24, 25, 28, 30. 
priority: MMIX-SIM 17, 1^, 21. 

privileged instructions: MMIX 37. 

privileged operations: MMIX 31, 33, 37, 43, 44. 

privileged.inst: MMIX-PIPE 118 . 355, MMIX- 

SIM 60, 94, 95, 97, 107, 108, 109. 
profile. gap: MMIX-SIM 53, 143. 

profile.showing.source: MMIX-SIM 53, 143. 

profile.started: MMIX-SIM 51, 52 . 

profiling: MMIX-SIM 141, 143, 144 . 

prog.file: MMMIX 4, 5, 6, 9, 10. 

prog.file.name: MMMIX 2, 3, 4, 6, 9, 10, 11. 

program counter: MMIX-PIPE 284. 
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PRDT_OFFSET : MMIX-PIPE 269, 293, 298. 

protection bits: MMIX 37, 46. 

protection fault: MMIX 45. 

prototypes for functions: MMIX-ARITH 2, 

MMIX-PIPE 6. 

prts-. MMIX-CONFIG 1^, 15, 23. 
prune: MMIXAL 73, 80. 

PRW_BITS : MMIX-PIPE 266 . 269. 

pseudojru: MMix-CONFiG 22, MMIX-PIPE 164 , 

186, 187, 189, 191. 
pseudo.op: MMIXAL 62. 

pst: MMIX-PIPE m, 51, 117, 254, 265, 266, 

271, 280, 321, 357. 

PTE: MMIX 45, 47. 

PTP: MMIX 45, 47. 

ptr.a: MMIX-CONFIG 16, MMIX-PIPE 44, 114, 

117, 215, 217, 222, 224, 227, 236, 237, 249, 

254, 255, 325, 326, 333, 334. 

ptr.b: MMIX-PIPE 44, 217, 218, 222, 224, 225, 

232, 233, 234, 237, 257, 261, 262, 272, 

274, 298, 300, 326. 

ptr.c: MMIX-PIPE 44, 224, 225, 236, 237. 

pure: MMIXAL 87, 94, 99, 100, 101, 110, 

116, 124, 129. 
push: MMIX-SIM 101 . 

push^popjsit: MMix-SiM 65, 132. 

PUSHGO : MMIX 29, MMIX-PIPE MMIX- 

SIM 101, MMIXAL 63. 
pushgo: MMIX-CONFIG 28, MMIX-PIPE 

51, 85, 110, 119, 331. 

PUSHGOI : MMIX-PIPE MMIX-SIM 54, 101. 

PUSHJ : MMIX 29, MMIX-PIPE MMIX-SIM 

101, MMIXAL 63. 

pushj: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 

85, 110, 119, 327. 

PUSHJB : MMIX-PIPE T7, MMIX-SIM 101. 

PUT : MMIX 43, MMIX-PIPE MMIX-SIM 54, 

97, MMIXAL 63. 

put: MMIX-CONFIG 28, MMIX-PIPE 51, 

118, 146, 149, 329. 

PUTI : MMIX-PIPE MMIX-SIM 97, 

MMMIX 12. 

PV: MMIX-CONFIG 15, 17, 20. 

PV-size: MMIX-CONFIG 1^, 17, 20. 

pv_spec: MMIX-CONFIG 12, 15, 17. 

PW_BIT : MMIX-PIPE 54, 266, 269. 

PX_BIT : MMIX-PIPE M, 269, 293, 298, 301. 

q: MMIX-ARITH 1^, 24, 61, 62, TO, 

MMIX-CONFIG 10, MMIX-PIPE 196 . 205 . 

255 . 256 . 258, m, 379, mmix-Sim 21, 

MMIXAL 

qhat: MMIX-ARITH 13, 20, 21, 22, 23. 

qloop: MMIX-PIPE 255 . 

qq: MMIX-ARITH 62, MMIXAL 112, 

113, 114, 118, 125, 130. 
quantify. mul: MMIX-PIPE 343 . 

queuelist: MMIX-PIPE 34, 125. 

quiet NaN: MMIX 21. 



r: MMIX-ARITH 31, 34, 62, 86, 88, 91, 

MMIX-PIPE 35, 93, 95, 189 . 191 . mmix- 
SIM TO, 22. 

rA: MMIX 21, 22, 32, 38. 

rA: MMIX-PIPE 52, 107, 108, 146, 324, 329, 

334, 342, MMIX-SIM 55, 72, 97, 103, 105, 
122, 131, 151, 158. 

ra: MMIX-PIPE M, 46, 59, 100, 108, 131, 144, 

307, 308, 324, 346. 

radix conversion: MMIX-ARITH 54, 68, 

MMMIX 17. 

random: MMIX-CONFIG 16, 22, MMIX-PIPE 7, 

164, 167, 186, 187. 

rank: MMIX-PIPE ICT, 172, 186, 187, 188, 

189, 191. 
rB: MMIX 35. 

rB: MMIX-PIPE 86, 310, 312, 319, MMIX- 

SIM 55, 72, 102, 104, 123, 151. 
rBB: MMIX 36, 38. 

rBB: MMIX-PIPE 312, 319, 322, 372, 380, 

MMIX-SIM 108, 151. 
rC: MMIX 45. 

rC: MMIX-PIPE 269, MMIX-SIM 151. 

rD: MMIX 20. 

rD: MMIX-PIPE 52, 107, MMIX-SIM 55, 66, 151. 

rE: MMIX 25. 

rE: MMIX-PIPE 52, 107, 108, MMIX-SIM 55, 

66, 151. 

read.bit: MMIX-SIM 83, 161, 162. 

read.byte: MMIX-SIM 27, mmotype TO, 26, 

27, 28, 30. 

read.hex: MMIX-MEM 1_, 2, MMMIX 15, 

17, 18, 22. 

read.tet: MMIX-SIM 26, 27, 28, 29, 33, 34, 

35, 36, 37, mmotype 9, 10, 13, 18, 19, 

20, 21, 23, 24, 25, 30. 

reader: MMIX-CONFIG 34, MMIX-PIPE 128, 167 . 

183, 233, 257, 266, 267, 271, 272, 273, 288, 
291, 296, 353, 354, 358, 359, 360, 365, 366. 
ready : MMIX-SIM 150 . 

REBD0T_SIGNAL : MMIX-PIPE 57_- 

recycle.fixup: MMIXAL 59, 112. 

redefinition. . . : MMIXAL 109. 

reg.val: MMIXAL 82, 87, 98, 99, 100, 101, 109, 

110, 118, 121, 122, 123, 124, 129. 

REGISTER : MMIXAL 58, 74, 78, 87, 109, 110. 

register number. . . : MMIXAL 98, 118. 

register stack: MMIX 29, 42, 43. 

register. truth: MMIX-PIPE 155, 156 . 157 . 345, 

MMIX-SIM 91, 92, 93. 
registerize: MMIXAL 86, 100. 

rel.addr.bit: MMIX-PIPE 75, 83, 106, MMIX- 

SIM 60, 65, MMIXAL 62, 124, 129. 
relative address... : MMIXAL 114, 126, 131. 

release.lock: MMIX-PIPE 37, 222, 226, 233, 

234, 272, 298, 356. 

ren.a: MMIX-PIPE M, 46, 100, 111, 117, 

119, 121, 123, 144, 145, 146, 147, 312, 

322, 334, 340. 
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ren.x: MMIX-PIPB 44) 46, 100, 110, 111, 112, 

114, 118, 119, 120, 123, 144, 145, 146, 147, 
236, 312, 322, 333, 334, 338. 
rename registers: MMix-PiPB 44, 86. 

rename.regs: MMIX-PIPB 63, 89, 111, 

145, 146, 147. 

reorder. hot -. MMIX-CONBIG 37, MMIX-PIPB 60, 
63, 67, 75, 145, 159, 318, 357, mmmix 18. 
reorder. buf. size: mmix-CONBIG 15.: 37. 

reorder.top: MMIX-CONFIG 37, MMIX-PIPB 60, 

61, 63, 67, 75, 145, 159, 318, 357, mmmix 18. 
repeating: MMIX-SIM 149 , 152, 153, 156. 

repl: MMIX-CONFIG 16, 23, MMIX-PIPB 167 . 

196, 199, 205. 

replace_policy : MMIX-CONFIG 22, MMIX- 

PIPB m, 167, 186, 187, 188, 189, 190, 191. 
report.error : MMIXAL 

res: MMIX-PIPB 93. 

resum: MMIX-PIPB 67, 314, 323, 325. 

RESUME : MMIX 38, 49, 50, MMIX-PIPB 47, 

304, 323, MMIX-SIM 60, 124, 125, 130, 
MMIXAL 63, MMMIX 12. 
resume: MMIX-CONFIG 28, MMIX-PIPB 51, 

85, 149, 322, 323, 325. 

RESUME_AGAIN : MMIX-PIPB 320 , 323, MMIX- 

SIM 71, 125, 130, 164. 
resume.again: MMIX-PIPB 323. 

RESUME.CONT : MMIX-PIPB 323, 364, 

MMIX-SIM V25, 126, 130. 

RESUME.SET : MMIX-PIPB 307, 320, 323, 324, 

MMIX-SIM 122, 1^, 126, 130. 
resume.simulation: MMIX-SIM 149 . 

RESUME.TRANS : MMIX-PIPB 242, 320, 323, 325. 

resume.trans : MMIX-PIPB 325, 326 . 

resuming: MMIX-PIPB 73, 78, 81, 103, 160, 

308, 309, 316, 323, 324, mmix-Sim 60, 61, 
71, 125, 127, 130, 141, 164. 

Reuter, Andreas Horst: MMIX 31. 

reversed: MMIX-PIPB 152 . 

rewind: MMIX-CONFIG 19, 38. 

rF: MMIX 48. 

rF: MMIX-PIPB M, MMIX-SIM 151. 

rf: MMIX-ARITH 91, 92. 

rG: MMIX 29, 39. 

rG: MMIX-PIPB M, 89, 102, 329, 330, 334, 342, 

MMIX-SIM M, 97, 104, 105, 151, 158. 
rH: MMIX 20. 

rH: MMIX-PIPB^, 121, MMIX-SIM 55. 88, 151. 

rhat: MMIX-ARITH 1^, 21. 

rhs: MMIX-SIM 96, 97, 98, 108, 133, 139 . 

rl: MMIX 40, MMIX-SIM 1. 

rl : MMIX-PIPB M, 314, MMIX-SIM 93, 127, 

151, 158, MMMIX 21. 

right: MMIX-SIM IR, 21, 22, 50, 162, 165, 

MMIXAL M, 57, 72, 73, 74. 
right.paren: MMIX-SIM 138, 139 . 

ring: MMIX-CONFIG 36, MMIX-PIPB 26, 28, 

29, 34, 35. 

ring of local registers: MMIX 42, 43. 



ring.size: MMIX-CONFIG 36, MMIX-PIPB 26, 

27, 28, 29, 125. 
rJ: MMIX 19, 29, 35. 

rJ: MMIX-PIPB 85, 107, 119, 312, 319, 

MMIX-SIM 55, 69, 101, 123, 151. 
rK: MMIX 36, 37, 38. 

rK: MMIX-PIPB 149, 314, 317, 322, 328, 

MMIX-SIM M, 77, 151, MMMIX 12, 15, 23. 
rL: MMIX 29, 39, 43. 

rL: MMIX-PIPB 52, 102, 112, 114, 119, 120, 

329, 330, 334, 338, mmix-Sim 37, 81, 

97, 101, 102, 104, 151, 158. 
rl: MMIX-PIPB 44, 46, 100, 112, 114, 119, 120, 

123, 145, 146, 147, 334, 338. 
rM: MMIX 10. 

rM: MMIX-PIPB 52, 107, MMIX-SIM^, 69, 151. 

rN: MMIX 41. 

rN: MMIX-PIPB 89, MMIX-SIM 77, 151. 

rO: MMIX 42, 43. 

rO: MMIX-PIPB M, 98, 118, MMIX-SIM 101, 

102, 104, 151, MMMIX 19. 

Robertson, James Evans: MMIX 40. 

rop: MMIX-SIM W, 71, 125, 126, 130, 164. 

ropcodes: MMIX 38, 47, 49. 

Rossmanith, Peter: MMIX-ARITH 26. 

R0UND_CURRENT : MMIXAL 14, 69. 

R0UND_D0WN : MMIX 28, MMIX-ARITH 30, 33, 

35, 46, 87, MMIX-PIPB 346 . mmix-Sim 100 . 
133, MMIXAL 14, 69. 
round.mode: MMIX-SIM 61, 89, 133, 138. 

R0UND_NEAR : MMIX 28, MMIX-ARITH 30, 33, 

35, 84, 87, MMIX-PIPB 346 , mmix-Sim 77, 
100 . 133, 158, MMIXAL 14, 69. 

R0UND_0FF : MMIX 28, MMIX-ARITH 30, 33, 

35, 39, 46, 87, 94, mmix-pipe 3^, mmix- 
SIM 100, 133, MMIXAL 14, 69. 

R0UND_UP : MMIX 28, MMIX-ARITH 33, 

35, 87, MMIX-PIPB 346 , mmix-Sim 100 , 
133, MMIXAL 14, 69. 
rounding modes: MMix 21, 32. 

rP: MMIX 31. 

rP: MMIX-PIPB M, 283, 335, 341, MMIX- 

SIM 55, 96, 102, 104, 151. 
rQ: MMIX 37, 40, 43. 

rQ: MMIX-PIPB M, 146, 149, 310, 314, 328, 

329, MMIX-SIM K, 151, MMMIX 12. 
rR: MMIX 20. 

rR: MMIX-PIPE 121, 335, 341, MMIX- 

SIM 55, 88, 102, 104, 151. 
rr: MMIX-CONFIG 22. 

rS: MMIX 42, 43. 

rS: MMIX-PIPE M, 98, 118, MMIX-SIM 82, 

83, 101, 102, 103, 104, 105, 151, mmmix 19. 
rT: MMIX 36. 

rT: MMIX-PIPE M, 122, 310, 312, 372, 

MMIX-SIM M, 77, 151, MMMIX 12. 
rt.op: MMIXAL 85, 97, 98. 

rTT: MMIX 37. 
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rTT-. MMix-PiPE 314, mmix-Sim 77, 
151, MMMIX 12. 
rU: MMIX 40, MMIX-SIM 1. 

rlJ : MMIX-PIPE 100, 146, MMIX-SIM 55, 

127, 140, 151. 

running times, approximate: MMIX 50. 

rV: MMIX 44, 45, 47. 

rV : MMIX-PIPE 329, MMIX-SIM 55, 77, 

151, MMMIX 12. 
rv. MMIX-PIPE 239 . 
rW: MMIX 34, 38. 

rW\ MMIX-PIPE 52, 320, 322, 373, mmix- 
SIM 55, 109, 123, 124, 151. 
rWW: MMIX 36, 38. 

rWW : MMIX-PIPE 320, 322, 373, MMIX- 

SIM 108, 151. 

rX: MMIX 34, 37, 38. 

rX : MMIX-PIPE 52, 320, 322, MMIX-SIM 55, 

60, 123, 124, 126, 151, 164. 
rXX: MMIX 36, 38. 

rXX : MMIX-PIPE M, 320, 322, 372, MMIX- 

SIM 108, 151. 

rY: MMIX 34, 38. 

rY : MMIX-PIPE 321, 324, MMIX-SIM 

123, 126, 151. 
rYY: MMIX 36, 38. 

rYY : MMIX-PIPE 52, 321, 323, 324, MMIX- 

SIM 108, 151. 

rZ: MMIX 34, 38. 

rZ: MMIX-PIPE M, 321, 324, 335, 339, MMIX- 

SIM 102, 103, 104, 105, 123, 126, 151. 
rZZ: MMIX 36, 38. 

rZZ: MMIX-PIPE M, 321, 323, 324, MMIX- 

SIM 108, 151. 

S: MMIX-SIM 76. 

s: MMIX-ARITH 7, Mi 38i Ml 40i Ml 

66 . 68 . 89 . MMix-io Ml Ml mmix-mem 1_, 

MMIX-PIPE 21, 28i Ml IMi Mil MI, ME, 
191 . 193 . 196, 205, 385, mmix-Sim 13, IM, 
154 , MMIXAL 28, il, 57- 

S_BIT : MMIX-PIPE Ml 149. 

S-non.miss: MMIX-PIPE 224 . 

SADD : MMIX 12, MMIX-PIPE £[_, MMIX-SIM M, 

87, MMIXAL 63. 

sadd: MMIX-CONFIG 15, 28, MMIX-PIPE 

51, 344. 

SADDI : MMIX-PIPE 47, MMIX-SIM 87. 

Satterthwaite, Edwin Hallowell, Jr.: MMIX- 

SIM 131. 

saturating arithmetic: MMIX 11. 

sav: MMIX-PIPE 327, 337. 

SAVE: MMIX 43, 50, MMIX-PIPE 47, 81, 281, 

341, MMIX-SIM M, 102, MMIXAL 63. 
save-. MMIX-CONFIG 28, MMIX-PIPE 51, 
327, 337, 340. 

Scache-. MMIX-CONFIG 17, 21, 35, 36, MMIX- 
PIPE 39, 1^, 215, 217, 218, 219, 220, 221, 
222, 224, 225, 226, 234, 261, 274, 300, 360, 
364, 367, 378, 379, mmmix 21. 



scan^close-. MMIXAL 98. 
scan^const: MMIX-ARITH 68, 69, MMIX- 

SIM 1^, 153. 

scan^eql: MMIX-SIM 153 . 

scan.hex'. MMIX-SIM 152, 153, 154 . 155, 161. 

scan^open: MMIXAL 86 . 

scan^option : MMIX-SIM 142, 143 , 149. 

scan.string: MMIX-SIM 153 . 155. 

scan.type: MMIX-SIM 152, 153 . 

schedule: MMIX-PIPE 27, 28, 31, 125, 326, 368. 

schedule.bit: MMIX-PIPE 8, 10, 28, 33. 

Schwoon, Stefan: MMIX-ARITH 26. 

Sclean: MMIX-PIPE 234. 

ScleanSnc: MMIX-PIPE 234 . 

Scleanjoop: MMIX-PIPE 234 . 

sclock: MMIX-SIM 19, 93, 127, 140. 

security violation: MMIX 37. 

security. disabled: MMIX-CONFIG 15, MMIX- 

PIPE 67. 

Sedgewick, Robert: MMIXAL 54. 

SEEK_END : MMIX-IO 2, 21, MMIX-SIM 4. 

SEEK_SET : MMIX-IO 2, 21, MMIX-SIM 4, 45, 46. 

segments: MMIX 44, 45, 47, MMMIX 9. 

Seidel, Raimund: MMIX-SIM 16. 

self: MMIX-PIPE 124, 125, 134, 215, 217, 222, 

224, 225, 226, 233, 234, 237, 257, 259, 260, 
261, 262, 264, 266, 272, 274, 279, 298, 300, 
301, 310, 350, 356, 358, 359, 360, 361, 362, 
364, 365, 366, 367, 368. 
sentinel: MMIX-PIPE 35, 36, 125. 

serial: MMIX-CONFIG 22, MMIX-PIPE 164 . 186, 

187, 189, 191, MMIXAL 59, 73, 74, 75, 
78, 100, 109, 112, 114, 118, 125, 130. 
serial number: MMIXAL 11, 21. 

seriaLnumber : MMIXAL 59, TO, 109. 

serialize: MMIXAL 59, 86, 100. 

SET : MMIX 10, MMIXAL 13, TO, 63, 124. 

set: MMIX-CONFIG 28, 32, MMIX-PIPE 49, 51, 

109, 137, 167, 177, 181, 192, 233, 234, 

343, MMMIX 12, 23. 

setJ: MMIX-PIPE 44, 46, 100, 112, 114, 119, 

120, 123, 145, 146, 147, 334, 338. 
setjock: MMIX-PIPE 81, 215, 217, 219, 

222, 224, 225, 226, 233, 234, 237, 259, 

260, 261, 262, 264, 271, 272, 274, 276, 

277, 297, 298, 300, 310, 358, 359, 360, 361, 
362, 365, 366, 367, 368. 
set.round: MMIX-PIPE 281, 346 . 

set.type: MMIX-SIM 153 . 

SETH: MMIX 13, MMIX-PIPE £7, 112, 323, 

MMIX-SIM 54, 71, 85, MMIXAL 63, 128. 

SETL : MMIX 13, MMIX-PIPE MMIX-SIM 54, 

85, MMIXAL 63, 128 . 

SETMH : MMIX 13, MMIX-PIPE 47, MMIX-SIM M, 

85, MMIXAL 63. 

SETML : MMIX 13, MMIX-PIPE MMIX-SIM M, 

85, MMIXAL 63. 

setsz: MMIX-CONFIG 13, 15, 23. 

seven.octa: MMMIX 23, 25. 
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sfile: MMIX-IO 6, 7, 8, 10, 11, 12, 13, 14, 15, 

16, 17, 18, 19, 20, 21, 22, 23. 

SFLOT : MMIX 27, 28, MMIX-PIPE MMIX- 

SIM 89, MMIXAL 63. 

SFLOTI : MMIX-PIPE 47, MMIX-SIM M, 89. 

SFLOTU ; MMIX 27, 28, MMIX-PIPE £[_, MMIX- 
SIM 54, 89, MMIXAL 63. 

SFLOTUI : MMIX-PIPE 47, MMIX-SIM 89. 

sfpack: MMIX-ARITH 34, 39, 40, 90. 

sfunpack'. MMIX-ARITH 39, 90. 
sh\ MMIX-CONFIG 15, 28, MMIX-PIPE 49, 141. 
sh.check\ MMIXAL 97 . 
shift.amt: MMIX-PIPE 141 . MMIX-SIM 87 . 

shiftjeft: MMIX-ARITH 7, 31, 34, 37, 38, 43, 

45, 47, 49, 51, 52, 53, 55, 63, 73, 87, 88, 89, 

92, 94, 95, MMix-MEM 1, 2, mmix-pipe 21, 
22, 113, 114, 118, 139, 141, 244, 279, 282, 
333, 339, MMIX-SIM 13, 14, 85, 87, 94, 95, 
154, 155, MMIXAL 28, 29, 94, 95, 101. 

shift.right: MMIX-ARITH 7, 29, 31, 33, 39, 

45, 47, 49, 51, 53, 63, 87, 88, 89, 94, 
MMIX-MEM 1, 3, MMIX-PIPE 21, 141, 239, 
243, 279, 282, 334, 343, mmix-Sim 13, 87, 
94, 95, MMIXAL 101, 114, 126, 131. 
shl: MMIX-PIPE 51, 141, MMIXAL 

97, 101. 

shlu-. MMIX-PIPE 51, 141. 
short float: MMIX 26, 27. 

show.breaks ■. MMIX-SIM 161, 162 . 
show.line: MMIX-SIM 50, 51, 82, 83, 

103, 105, 128. 

show.pred.bit-. MMIX-PIPE 8, 46, 152, 160. 
show.spec.bit-. MMIX-MEM 2, 3, MMIX-PIPE 8. 
show.stats-. MMIX-SIM 128, 140, 141, 149. 
show.wholecache.bit: MMIX-PIPE 8, 177. 

showing.source: MMIX-SIM 49, 51, 53, 

128, 143. 

showing.stats: MMIX-SIM 128, 129, 141, 143. 

shown.file: MMIX-SIM 47, 49, 53. 

shown.line: MMIX-SIM 47, 49, 53, 128. 

shr: MMIX-PIPE 51, 141, MMIXAL 

97, 101. 

shrt: MMIX-PIPE 21, MMIX-SIM 13 . 

shru: MMIX-PIPE 51, 141. 

SIGINT : MMIX-SIM 147, 148. 

sign: MMIX-ARITH 68, 70, 73, 84. 

sign.bit: MMIX-ARITH 4, 12, 24, 33, 35, 37, 

38, 39, 40, 41, 44, 46, 54, 87, 89, 91, 

93, MMIX-CONFIG 32, 33, MMIX-IO 21 . 
MMIX-PIPE OT, 81, 82, 85, 89, 91, 100, 113, 
118, 119, 140, 143, 144, 149, 157, 160, 177, 
179, 205, 230, 233, 234, 244, 266, 271, 279, 
288, 296, 320, 322, 331, 346, 353, 354, 355, 
364, 368, MMIX-SIM 15, 84, 85, 89, 90, 91, 

94, 95, 108, 123, 124, 127, 157, 159, 161. 

signal: MMIX-SIM 147, 148. 

signaling NaN: MMIX 21. 

signed integers: MMIX 6, 7. 



signed.odiv: MMIX-ARITH 24, MMIX-PIPE 21, 

343, MMIX-SIM 1^, 88. 

signed.omult : MMIX-ARITH 12, MMIX-PIPE 21, 

343, MMIX-SIM 13, 88. 
sim: MMIX-PIPE 21, MMIX-SIM 13 . 

sim_flle_info: MMIX-IO 5, 6. 

Singh, Balbir: MMIX-ARITH 26. 

Sites, Richard Lee: MMIX 3, 40. 

size: MMIX-IO 4, 12, 13, 14, 15, 1^, 17, 1^, 

MMIX-MEM 1, 2, 3, MMIX-PIPE 208 . 246 . 
256, 260, 3M: mmix-Sim 4, 114, 117 . 

SL : MMIX 14, MMIX-PIPE £[_, MMIX-SIM M, 

87, MMIXAL 63. 

sleep: MMIX-PIPE 125, 224, 257, 272, 274, 

298, 300, 301. 

sleepy : MMIX-PIPE 301, 302, 303. 

SLI : MMIX-PIPE 47, MMIX-SIM M, 87. 

SLU : MMIX 14, MMIX-PIPE £[_, MMIX-SIM 54, 

87, MMIXAL 63. 

SLUI : MMIX-PIPE MMIX-SIM M, 87. 

sl3: MMMIX 19, 20. 

Sorry, I can’t open. . . : MMIX-SIM 145, 146. 

source: MMIXAL 126 . 131 . 

spec: MMIX-PIPE^, 41, 42, 43, 44, 92, 93, 284. 

spec.bit: MMIXAL §2, 102. 

spec.install : MMIX-PIPE 94, 95, 110, 112, 113, 

114, 117, 118, 119, 120, 121, 312, 322, 333, 
334, 338, 339, 340, 355. 

spec.mode: MMIXAL 44, 52, 102, 132. 

spec.mode.loc: MMIXAL 43, 52, 132. 

spec.read: MMIX-MEM 1, 2, MMIX-PIPE 206, 

208 . 271. 

spec.reg.code : MMIX-SIM 151 . 152. 

spec.regg.code: MMIX-SIM 151 , 152. 

spec.rem: MMIX-PIPE 96, 97, 123, 145, 146, 

147, 256. 

spec.write: MMIX-MEM 1, 3, MMIX-PIPE 206, 

208 . 246, 260. 

special registers: MMIX 39, 43. 

special.name: MMIX-PIPE 53. 91, MMIX-SIM 56. 

103, 105, 138, MMIXAL 66, OT. 
special.reg: MMIX-SIM 

specnode: MMIX-CONFIG 37, MMIX-PIPE 

43, 44, 71, 86, 92, 93, 94, 95, 96, 97, 100, 

115, 120, 255, MMMIX 23. 

specnode_struct: MMIX-PIPE 40 . 

specval: MMIX-PIPE 92, 93, 104, 105, 106, 

108, 113, 114, 118, 120, 122, 312, 322, 
323, 324, 339. 

speed.lock: MMIX-PIPE 39, 2£l, 257, 362. 

Sprep: MMIX-PIPE 233, 234 . 

sprintf: MMIX-SIM 24, 45, 80, 101, MMIXAL 45, 

138, MMOTYPE 28. 

square.one: MMIX-PIPE 272 . 369, 370. 

SR : MMIX 14, MMIX-PIPE 47, MMIX-SIM M, 

87, MMIXAL 63. 

src.file: MMIX-SIM 42, 45, 47, 48, 49, 

MMIXAL 34, 35, 138, 139 . 
src.file.name: MMIXAL 137, 138, 139 , 140. 
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SRI : MMIX-PIPE 47, MMIX-SIM 87. 

SRU : MMIX 14, MMIX-PIPE £1, MMIX-SIM 

87, MMIXAL 63. 

SRUI : MMIX-PIPE MMIX-SIM 54, 87. 

sscanf -. MMIX-CONFIG 38, MMIX-SIM 143, 
MMIXAL 137, MMMIX 7, 8, 15, 18, 19, 21. 
st: MMIX-CONFIG 27, 28, MMIX-PIPE 51, 

117, 254, 265, 266, 267, 270, 271, 272, 

279, 280, 321, 327. 
st.mtime: MMIX-SIM 44. 

st.ready : MMIX-PIPE 267, 270, 271, 272, 280. 

stab.start: mmotype 25, 29, 30. 

Stack overflow: MMIX 45. 

stack pointer: MMIXAL 18. 

stack.alert'. MMIX-PIPE M, 100, 113, 146, 269. 
stackjoad: MMIX-SIM 101, 104. 

stack_op: MMIXAL 83, 84. 

STACK_QVERFLDW : MMIX-PIPE CT, 146. 

stack.overflow: MMIX-PIPE 146, 148 . 

Stack_Segment : MMIX-SIM 3, 37, MMIXAL 69. 

stack^store: MMIX-SIM 81, 83, 101, 102, 103. 

stack J.racing ■. MMIX-SIM 61, 82, 83, 103, 

105, 143. 

stage: MMIX-CONFIG 26, 34, 35, 36, MMIX- 

PIPE 23, 25, 26, 28, 39, 59, 124, 125, 126, 
128, 129, 134, 136, 174, 231, 236, 249, 284. 
stages: MMIX-CONFIG 27, 28, 29. 

stall: MMIX-PIPE 75, 82, 101, 102, 111, 120, 

312, 322, 332. 

stamp: MMIX-PIPE 246 , 251, 256, 257, MMIX- 

SIM 16, 17, 21. 

standard floating point conventions: MMIX 22. 

standard.NaN : MMIX-ARITH 4, 41, 44, 

46, 91, 93. 

start.fetch: MMIX-PIPE 288 . 289. 

start Jd.st: MMIX-PIPE 265 . 

startup: MMIX-PIPE 31, 81, 203, 219, 221, 

225, 233, 244, 249, 257, 259, 260, 261, 266, 
267, 271, 272, 273, 274, 276, 277, 286, 287, 
288, 291, 296, 297, 298, 300, 353, 354, 358, 
359, 360, 361, 365, 366. 

stat: MMIXAL 

stat: MMIX-SIM 43, 44. 

stat.buf : MMIX-SIM M. 

state: MMIX-PIPE 30, 31, M, 46, 124, 125, 

130, 131, 133, 134, 135, 215, 217, 219, 

222, 224, 232, 233, 234, 237, 257, 259, 

260, 262, 264, 265, 267, 268, 270, 271, 272, 
273, 274, 276, 277, 278, 279, 280, 281, 288, 
291, 292, 295, 296, 297, 298, 300, 301, 310, 
325, 326, 345, 351, 354, 358, 359, 360, 361, 
364, 368, MMIX-SIM 160 . 
state-4: MMIX-PIPE 308, 310 . 311. 
state.5: MMIX-PIPE 307, 310, 311. 
status: MMIXAL 87, 94, 98, 99, 100, 

101, 109, 110, 116, 117, 118, 121, 122, 

123, 124, 129. 

STB : MMIX 8, MMIX-PIPE M, 256, 281, 

MMIX-SIM 54, 95, 123, MMIXAL 63. 



STBI : MMIX-PIPE £7, MMIX-SIM 54, 95. 

STBU : MMIX 8, MMIX-PIPE 281, MMIX- 

SIM 54, 95, MMIXAL 63. 

STBUI : MMIX-PIPE £1, MMIX-SIM 54, 95. 

STCD : MMIX 8, MMIX-PIPE £1, 117, 256, 

MMIX-SIM 54, 95, MMIXAL 63. 

STCDI : MMIX-PIPE £[_, MMIX-SIM 54, 95. 

StdErr : MMIX-SIM 4, 134, 137, MMIXAL 69. 

stderr: MMIX-CONFIG 8, MMIX-IO 7, MMIX- 

PIPE 13, 381, 384, MMIX-SIM 4, 14, 24, 26, 
35, 44, 49, 143, 145, 146, mmixal 35, 45, 79, 

137, 142, 145, MMMIX 3, 6, 7, 8, 9, 10, 11, 
12, MMOTYPE 2, 3, 9, 14, 20, 23, 25, 26, 30. 

Stdin : MMIX-SIM 4, 134, 137, MMIXAL 69. 

stdin: MMIX-IO 7, 10, 13, 15, 17, MMIX-MEM 2, 

MMIX-PIPE 387, MMIX-SIM 4, 120, 150, 
MMMIX 13. 

Stdln> : MMIX-PIPE 387, MMIX-SIM 120. 

stdin.buf: MMIX-PIPE 387, 388 , MMIX- 

SIM 120, 121. 

stdin.buf.end: MMIX-PIPE 387, 388 , MMIX- 

SIM 120, 121. 

stdin.buf. start: MMIX-PIPE 387, 388 , MMIX- 

SIM 120, 121. 

stdin.chr: MMIX-IO 4, 13, 15, 17, MMIX- 

PIPE 377, 387 , MMIX-SIM 120 . 

StdDut : MMIX-SIM 4, 134, 137, MMIXAL 69. 

stdout: MMIX-IO 7, MMIX-PIPE 387, MMIX- 

SIM 4, 120, 133, 137, 138, 150, 156, 159, 
MMMIX 13. 

STHT : MMIX 8, MMIX-PIPE £7_, 281, MMIX- 

SIM 54, 95, MMIXAL 63. 

STHTI : MMIX-PIPE £[_, MMIX-SIM 54, 95. 

sticky bit: MMIX-ARITH 31, 34, 49, 53, 79, 87. 

STO : MMIX 8, MMIX-PIPE £7_, MMIX-SIM 54, 

95, MMIXAL 63. 

STOI : MMIX-PIPE 47, MMIX-SIM 54, 95. 

Stone, Harold Stuart: MMIX 31. 

stop: MMIX-IO 4, MMIX-PIPE 381 , 382, 383, 

MMIX-SIM 114 , 115, 116. 
store-fx: MMIX-SIM 89 . 

storejnew.char : MMIXAL 57. 

store.sf: MMIX-ARITH MMIX-PIPE 21, 281, 

MMIX-SIM 1^, 95. 

store-x: MMIX-SIM 85, 86, 87, 88, 89, 90, 

92, 94, 97, 102, 107. 

STOU : MMIX 8, MMIX-PIPE £7, 113, 339, 

MMIX-SIM 54, 95, MMIXAL 63. 

STOUI : MMIX-PIPE £7_, MMIX-SIM 54, 95. 

strcmp: MMIX-CONFIG 18, 19, 20, 21, 22, 23, 

24, 38, MMIXAL 38, MMMIX 4, MMOTYPE 28. 
strcpy: MMIX-CONFIG 10, 18, 25, 38, MMIX- 

SIM 96, 97, 98, 107, 108, 149, mmixal 137, 

138, MMOTYPE 28. 

stream^name: MMIX-SIM 137 . 

string: MMIX-IO 1^, 20, MMIX-SIM 4. 

strlen: MMIX-ARITH 67, MMIX-CONFIG 10, 25, 

27, 38, MMIX-PIPE 387, mmix-Sim 4, 24, 42, 
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45, 120, 143, 149, 150, 163, mmixal 34, 50, 
138, MMMIX 4, 6, MMOTYPE 28. 
stmcmp: MMIX-ARITH 68, 79. 

strncpy. MMOTYPE 28. 
strong: MMIXAL 82, 83. 

STSF : MMIX 26, MMIX-PIPE £7, 256, 281, 

MMIX-SIM 95, MMIXAL 63. 

STSFI : MMIX-PIPE MMIX-SIM 54, 95. 

STT : MMIX 8, MMIX-PIPE 47, 281, MMIX- 

SIM 95, MMIXAL 63. 

STTI : MMIX-PIPE MMIX-SIM 95. 

STTU : MMIX 8, MMIX-PIPE 47, 281, MMIX- 

SIM 54, 95, MMIXAL 63. 

STTUI : MMIX-PIPE 47, MMIX-SIM M, 95. 

STUNG : MMIX 30, MMIX-PIPE T[_, 281, MMIX- 

SIM 54, 95, MMIXAL 63. 
stunc: MMIX-PIPE^, 251, 254, 257, 281. 

STUNCI : MMIX-PIPE MMIX-SIM M, 95. 

STW : MMIX 8, MMIX-PIPE 47, 281, MMIX- 

SIM 54, 95, MMIXAL 63. 

STWI : MMIX-PIPE MMIX-SIM 95. 

STWU : MMIX 8, MMIX-PIPE £7, 281, MMIX- 

SIM 54, 95, MMIXAL 63. 

STWUI : MMIX-PIPE MMIX-SIM M, 95. 

style: MMIX-SIM 133, 134, 137 . 

SUB : MMIX 9, MMIX-PIPE £J_, MMIX-SIM M, 

85, MMIXAL 63. 

sub: MMIX-CONFIG 28, MMIX-PIPE 44, 

51, 140. 

SUBI : MMIX-PIPE 47, MMIX-SIM M, 85. 

subnormal numbers: MMIX 21. 

subroutine library initialization: MMIX-SIM 6, 

164. 

SUBSUBVERSION : MMIX-PIPE 89, MMIX-SIM 77. 

SUBU : MMIX 9, MMIX-PIPE 47, MMIX-SIM M, 

85, MMIXAL 63. 

subu: MMIX-CONFIG 28, MMIX-PIPE 

51, 139. 

SUBUI : MMIX-PIPE £1, MMIX-SIM 54, 85. 

SUBVERSION : MMIX-PIPE MMIX-SIM 77. 

support: MMIX-PIPE 78, 79, 80. 

suppress. dispatch : MMIX-PIPE 64, 65, 317. 

suiitchable.string : MMIX-SIM 138, 139. 

switchO: MMIX-PIPE 288 , 299. 
switchl : MMIX-PIPE 130 . 133, 265, 327, 

345, 359, 360. 

switch2: MMIX-PIPE 135 . 364. 

SWYM : MMIX 49, MMIX-PIPE £1, 301, 321, 323, 

325, MMIX-SIM 54, 107, mmixal 63. 
swym.one: MMIX-PIPE 301 . 302. 

sy: MMIX-ARITH 24 . 

sym: MMIXAL M, 64, 66, 70, 71, 72, 73, 74, 

75, 76, 78, 87, 91, 100, 104, 110, 111, 

118, 125, 130, 144. 
sym.avail: MMIXAL 59, 60 . 

sym.buf : MMIXAL 75, T7_, 78, 79, 80, 

MMOTYPE 25, 26, 27, 28, 29. 
symjength.max: MMOTYPE 26, 29. 

sym_node: MMIXAL 58. 59, 60, 65, 74, 90, 109. 



sym.ptr: MMIXAL 75, 77, 78, 79, 80, 

MMOTYPE 25, 26, 28, 29. 
sym.root : MMIXAL 60 . 

sym.tab.struct: MMIXAL 54, 58 . 

Symbol table. . . : MMOTYPE 22, 25. 

symbol ... already defined: MMIXAL 109. 

symbol.! ound: MMIXAL CT, 88, 89. 

SYNC : MMIX 31, MMIX-PIPE £]_, 304, 323, 

MMIX-SIM 54, 107, MMIXAL 63. 
sync: MMIX-CONFIG 28, MMIX-PIPE 51, 

230, 233, 234, 251, 254, 256, 257, 355, 
356, 361. 

sync.check: MMIX-PIPE 269, 271, 370 . 

sync.L: MMIX-SIM 101 . 

SYNCD : MMIX 30, MMIX-PIPE £1, MMIX-SIM M, 

106, MMIXAL 63. 

syncd: MMIX-PIPE 51, 230, 265, 269, 271, 

280, 320, 323, 364, 368, 369. 

SYNCDI : MMIX-PIPE £J_, MMIX-SIM M, 106. 

SYNCID : MMIX 30, MMIX-PIPE £[_, MMIX- 

SIM 54, 106, MMIXAL 63. 
syncid: MMIX-PIPE 51, 85, 119, 265, 266, 

267, 269, 270, 271, 272, 280, 320, 323. 
SYNCIDI : MMIX-PIPE £1, MMIX-SIM M, 106. 

syntax error . . . : MMIXAL 86, 97. 

syntax of floating point constants: MMIX- 

ARITH 68. 

sys.call: MMIX-PIPE 371 . MMIX-SIM 59. 

system dependencies: MMIX-ARITH 3, MMIX- 

lO 16, MMIX-PIPE 17, 89, MMIX-SIM 10, 43, 
44, 77, MMIXAL 26, MMOTYPE 27. 
System/360: MMIX 7. 

System/370: MMIX 31. 

SZ : MMIX-ARITH 24. 

t: MMIX-ARITH 8, 13, 90, MMIX- 

PIPE 95, 97, 197, 2££, mmix- 

SIM 15, 166, MMIXAL M, £i, 73, 74, 
MMOTYPE 8. 

tag: MMIX-CONFIG 32, 33, MMIX-PIPE 167 , 

172, 176, 177, 179, 185, 193, 196, 197, 201, 
203, 205, 206, 210, 213, 216, 217, 218, 219, 
221, 223, 226, 233, 234, 245, 259, 276, 353, 
354, 378, 379, mmmix 12, 23. 
tagmask: MMIX-CONFIG 31, MMIX-PIPE 167 . 

192, 193, 205. 

tail: MMIX-PIPE 64, M, 71, 73, 74, 85, 120, 

160, 301, 304, 308, 309, 316, mmmix 12, 22. 
TC: MMIX 46. 

TDIF : MMIX 11, MMIX-PIPE 47, MMIX-SIM M, 

87, MMIXAL 63. 

tdif: MMIX-CONFIG 28, MMIX-PIPE 51, 344. 

tdif.l: MMIX-PIPE 344 . MMIX-SIM OT. 

TDIFI : MMIX-PIPE £7_, MMIX-SIM M, 87. 

terabytes: MMIX 42, 45. 

terminate: MMIX-PIPE V25, 126, 144, 215, 217, 

221, 222, 224, 232, 237. 
terminator: MMIXAL 57, 87. 

ternary_trie_struct: MMIXAL M. 
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testjoad.bkpt: MMIX-SIM 94, 96, 105, 

111, 114. 

test.overflow. MMix-SiM 88. 
test.store.bkpt: MMIX-SIM 82, 95, 96, 103, 117. 

tet: MMIX-SIM 16, 25, 26, 28, 30, 33, 34, 37, 

51, 63, 82, 83, 94, 95, 96, 103, 105, 111, 
114, 118, 119, 157, 159, 163, 164, 165, 
MMOTYPE 9, 11, 15, 18, 19, 21, 23, 24. 
TETRA : MMIXAL 17, §2, 63, 117. 

tetra: MMIX-ARITH 3, 7, 8, 13, 26, 27, 28, 

29, 34, 38, 39, 40, 54, 59, 60, 61, 62, 82, 
90, MMix-io 3, MMix-PiPE 17_, 21, 68, 73, 
76, 78, 91, 120, 206, 210, 213, 246, 255, 
MMIX-SIM 10, 13, 15, 16, 19, 25, 31, 44, 61, 
62, 95, 101, 114, 164, 165, 166, mmixal 26, 
43, 48, 52, 68, 76, 105, 120, mmmix 15, 

20, MMOTYPE 7, 8, 11. 

tetrabyte: MMIX 6. 

Text_Segment : MMIX-SIM 3. 

TextRead : MMIX-SIM 4, MMIXAL 69. 

TextWrite : MMIX-SIM 4, MMIXAL 69. 

The number of local . . . : MMIX-SIM 77. 

the operand is undefined : MMIXAL 109, 

129. 

The symbol table isn’t... : MMOTYPE 30. 

thinking big: MMix-PiPE 58, 74. 

third.operand : MMIX-PIPE 103, 107, 108, 

MMIX-SIM §4, 71, 79. 

This can’t happen: MMIX-PIPE 13. 

three.arg.bit: MMIXAL §2, 116. 

thresh: MMIX-ARITH 93, 94. 

ticks: MMIX-MEM 2, 3, MMIX-PIPE 10, 14, 28, 

64, W, 187, 251, 256, 257, mmmix 2, 15, 23. 
time: MMIX-PIPE 89, MMIX-SIM 77, 

MMIXAL 141. 

times: MMIXAL 97, 101. 

tininess: MMIX-ARITH 31. 

TLB: MMIX-PIPE 163. 

tmp: MMIX-SIM 31, 34, MMMIX 1^, 18, 22, 

MMOTYPE 16, 19, 24. 
tmpo: MMIX-PIPE 141 . 

token: MMIX-CONFIG 9, 10, 11, 18, 19, 20, 

21, 22, 23, 24, 25. 

token.prescanned: MMix-CONFiG 9, 10, 22, 24. 

Tomasulo, Robert Marco: MMIX-PIPE 58. 

too many global registers : MMIXAL 108. 

too many operands... : MMIXAL 116. 

top.op: MMIXAL 83, 85. 

top.val: MMIXAL 87, 94, 98, 99, 100, 101. 

traceAit: MMIX-SIM 63, 161, 162. 

trace.format: MMIX-SIM 64, 65, 131. 

trace.print: MMIX-SIM 136, 137 . 

traceJ.hreshold: MMIX-SIM 61, 63, 143. 

tracing: MMIX-SIM W, 63, 82, 83, 93, 103, 105, 

107, 122, 127, 128, 149. 
tracing.exceptions: MMIX-SIM 61, 122, 143. 

trailing characters... : MMIXAL 35. 

trans: MMIX-PIPE 241. 



trans.key: MMIX-PIPE 2^, 245, 267, 272, 291, 

298, 302, 326, 353, 354. 
translation caches: MMIX 46, 47, 49, MMIX- 

PIPE 163. 

TRAP : MMIX 33, 36, 50, MMIX-PIPE £1, 80, 82, 

320, MMIX-SIM 54, 108, MMIXAL 63. 
trap: MMIX-CONFIG 28, MMIX-PIPE 49, 51, 80, 

81, 82, 85, 103, 149, 310, 312, 313, 317, 320. 
trapjormat: MMIX-SIM 108, 110 . 139. 

trap Joe: MMIX-PIPE 373. 

traps: MMIX 35. 

trie.node: MMIXAL M, 55, 56, 57, 65, 73, 

74, 82, 90. 

trie.root: MMIXAL 56, 61, 66, 70, 71, 80, 

87, 111, 144. 

triesearch: MMIXAL 57, 64, 66, 70, 71, 87, 

104, 111, 144. 

TRIP : MMIX 33, 35, 50, MMIX-PIPE 

MMIX-SIM 54, 108, 123, MMIXAL 63. 
trip: MMIX-OONFIG 28, MMIX-PIPE 49, 51, 

80, 85, 312, 313, 317. 
trip.warning : MMIX-IO 23, 24 . 

tripping: MMIX-SIM W, 123, 131. 

trips: MMIX 35. 

true: MMIX-ARITH 1, 24, 68, MMIX-CONFIG 15, 

22, 24, MMIX-PIPE 11, 59, 68, 85, 89, 100, 
106, 108, 110, 112, 113, 114, 117, 118, 119, 
120, 121, 144, 146, 170, 185, 217, 227, 236, 
238, 239, 259, 262, 263, 265, 302, 304, 310, 
312, 314, 316, 317, 322, 324, 330, 331, 332, 
333, 334, 337, 338, 339, 340, 345, 350, 355, 
361, 364, 373, mmix-Sim 9, 45, 51, 63, 82, 
83, 87, 90, 93, 103, 105, 107, 109, 122, 123, 
125, 127, 128, 141, 142, 143, 148, 149, 150, 
153, 164, MMIXAL 26, 35, 41, 71, 87, 111, 
132, MMMIX 6, 7, 8, 9, 10, 11. 
true-head: MMIX-PIPE 74, 81. 

try -Complement: MMIX-ARITH 94, 95. 

trying -to -interrupt: MMIX-PIPE 314, 315, 330, 

351, 363, 364. 

tt: MMIX-ARITH 65, 66, 83, MMIX-PIPE 28, 

MMIXAL 57, 64, 66, 70, 87, 88, 89, 111. 

Two file names. . . : MMOTYPE 20. 

two-arg-bit: MMIXAL 62, 116. 

Type tetra. . . : MMIXAL 29. 



tO: 


MMMIX 


Ip 


tl : 


MMMIX 


Ip 


t2: 


MMMIX 


Ip 


tS: 


MMMIX 


Ip 



v: MMIX 50, MMIX-SIM 1. 

u: MMIX-ARITH 7, 8, 13, MMIX-MEM 1, 

MMIX-PIPE 21, 75, 79, 97, mmix-Sim 13, 
MMIXAL 28. 

U_BIT : MMIX-ARITH 31, 33, 35, MMIX-PIPE M, 

307, MMIX-SIM 57, 89, 122, mmixal 69. 
U_Handler : MMIXAL 69. 

unary: MMIXAL 83. 

unary-check: MMIXAL 100 . 
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undefined: MMIXAL 87, 99, 101, 109, 110, 

117, 118, 121, 122, 123, 124, 129. 
undefined constant : MMIXAL 118. 

undefined local symbol : MMIXAL 145. 

undefined symbol : MMIXAL 79. 

underflow: MMIX 21, 22, 32, MMIX-ARITH 31. 

undump. octa: MMMIX 9, U), 11. 

Unexpected end of file... : MMMIX 11, 

MMOTYPE 9. 

Unicode: MMIX 6, MMIXAL 5, 6, 7, 30, 75, 

MMOTYPE 27. 

uninit. mem.bit: MMIX-PIPB 8, 210. 

uninitialized memory... : MMIX-PIPE 210. 

unit.busy: MMix-PiPE 82 . 

unit.found: MMix-PiPB 82 . 

Unknown lopcode : MMOTYPE 13. 

unknown operation code : MMIXAL 104. 

UNKN0WN_SPEC : MMIX-PIPB 71, 73, 85, 120, 

123, 290, 309. 

unsay: MMIX-PIPB 49, 327, 332. 

UNSAVE : MMIX 43, 50, MMIX-PIPB T7, 81, 102, 

279, 332, 335, mmix-Sim M, 104, 164, 
MMIXAL 63, MMMIX 12. 
unsave: MMIX-CONFIG 28, MMIX-PIPB 

51, 327, 332. 

unschedule: MMIX-PIPB 145, 287. 

unsgnd: MMIX-PIPE 21, MMix-SiM 13 . 

Unsupported virtual address : MMMIX 11. 

up: MMIX-PIPE 73, 85, 86, 89, 93, 95, 97, 

100, 102, 114, 116, 117, 120, 146, 227, 

254, 255, 312, 333, 334. 
update.listing.loc: MMIXAL 44. 

usage: MMIX-PIPB 46, 81, 100, 146, 324, 

MMIX-SIM 143 . 

Usage: ... : MMIX-SIM 143, MMIXAL 137, 

MMMIX 3, MMOTYPE 2. 
usage.help: MMIX-SIM 143, 144 . 

use.and.fix: MMIX-PIPB 195 , 196 , 198, 201, 

217, 262, 268, 270, 271, 272, 273, 292, 

293, 296, 353, 354. 
useful : MMIXAL 73. 

v: MMIX-ARITH 8, 13, MMIX-CONFIG 11, 12, 

13 . 14 , MMIX-PIPE 167 . 

V_BIT : MMIX-ARITH MMIX-PIPE 140, 

141, 282, 343, MMIX-SIM 57, 84, 85, 87, 

88, 95, MMIXAL 69. 

V_Handler : MMIXAL 69. 

val: MMIX-ARITH 68, ra, 71, 73, 83, 84, 

MMIX-MEM 2, 3, MMIX-PIPB 208 . 212 . 

213, MMIX-SIM 13, 30, 153, 155, 157, 
158, 161, MMMIX 17. 
val.node: MMIXAL 83, 84. 

val.ptr: MMIXAL 85, 87, 94, 99, 110, 

116, 117. 

val.stack: MMIXAL 81, 82, 84, 85, 108, 109, 

110, 116, 117, 118, 121, 122, 123, 124, 125, 
126, 127, 129, 130, 131, 132, 134. 
Vandevoorde, Mark Thierry: MMIX 40. 



vanish: MMIX-CONFIG 34, MMIX-PIPE 126, 

128, 129, 260. 

vanish.ctl: MMIX-PIPE 127 . 128. 

vctsz: MMIX-CONFIG 1^, 15, 23. 

verb: MMIXAL 100, 101. 

verbose: MMIX-MEM 2, 3, MMIX-PIPE 4, 10, 

28, 33, 46, 81, 125, 145, 146, 147, 149, 
152, 160, 177, 210, 283, 310, 314, 319, 
320, 321, MMIX-SIM 140 . MMMIX 15, 
MMOTYPE 2, 4, 9, 30. 

VERSION : MMIX-PIPE MMIX-SIM 77. 

version number: MMIX 41, 51. 

vh: MMIX-ARITH 1^, 17, 21. 

victim: MMIX-CONFIG 33, MMIX-PIPE 167 . 

177, 181, 193, 196, 199, 205, 233, 234. 
VIIIADDU : MMIX-PIPB £1, MMIX-SIM M, 85. 

VIIIADDUI : MMIX-PIPE T7, MMIX-SIM 85. 

virt: MMIX-PIPB 241 . 

virtual address emulation: MMIX 49. 

virtual addresses: MMIX 44, 45, 47. 

vmh: MMIX-ARITH 13, 17, 21. 

vrepl: MMIX-CONFIG 16, 23, MMIX-PIPE 167 . 

196, 199, 205. 

Vuillemin, Jean Etienne: MMIX-SIM 16. 

vv: MMIX-CONFIG 16, 23, 31, 33, MMIX- 

PIPB 167, 177, 181, 193, 196, 199, 205, 
233, 234. 

W: MMIX-ARITH 8, MMIX-SIM 61. 

W_BIT : MMIX-ARITH 31, 88, MMIX-PIPE M, 

346, MMIX-SIM 57, 89, mmixal 69. 
W_Handler : MMIXAL 69. 

wait: MMIX-PIPE 125, 131, 133, 134, 215, 216, 

217, 218, 219, 221, 222, 223, 224, 225, 

233, 234, 237, 257, 259, 260, 261, 262, 

263, 264, 266, 271, 272, 273, 276, 277, 

278, 279, 281, 283, 288, 290, 297, 298, 301, 
310, 326, 328, 329, 330, 342, 350, 351, 353, 
354, 356, 357, 358, 359, 360, 361, 362, 363, 
364, 365, 366, 367, 368. 
wait.or.pass : MMIX-PIPB 2^, 292, 295, 296. 

Waldspurger, Carl Alan: MMIX 40. 

wbuf.bot : MMIX-CONFIG 37, MMIX-PIPE 247 , 

251, 255, 256, 257, 378, 379. 
wbuf.lock: MMIX-PIPE 39, 247, 256, 257, 259, 

260, 262, 264, 360. 

wbuf.top: MMIX-CONFIG 37, MMIX-PIPE 247 . 

249, 251, 255, 256, 257, 378, 379. 
wcslen: MMIX-SIM 4. 

WDIF : MMIX 11, MMIX-PIPE 47, MMIX-SIM M, 

87, MMIXAL 63. 

wdif: MMIX-CONFIG 28, MMIX-PIPE 

51, 344. 

WDIFI : MMIX-PIPB 47, MMIX-SIM 87. 

weak: MMIXAL 83. 

Weihl, William Edward: MMIX 40. 

what.say: MMIX-SIM 149 . 152, MMMIX 1^, 

15, 18, 19. 

Wheeler, David John: MMIX-ARITH 26. 
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Wilkes, Maurice Vincent: MMIX 30, MMIX- 

ARITH 26. 

Wirth, Niklaus Emil: MMIX 7. 

wow: MMIX-PIPE 11 . 

wra: MMIX-CONFIG 13, 15, 23. 
wrb: MMIX-CONFIG 1^, 15, 23. 

WRITE_ALL0C : MMIX-CONFIG 23, 31, MMIX- 

PIPE 166, 167, 217, 257. 

WRITE_BACK : MMIX-CONFIG 23, MMIX- 

PIPE 166, 167, 217, 263. 
write J)it: MMIX-SIM 82, 161, 162. 

writeAuTsize: MMIX-CONFIG 1^, 37. 

write.co: MMIX-PIPE 248 , 249. 

write^ctl: MMIX-PIPE 248 , 249, 360. 

write Jrom.wbuf : MMIX-PIPE 129 , 249, 257, 

272. 

writejiead: MMIX-PIPE 247 . 249, 251, 255, 256, 

257, 259, 260, 261, 262, 360, 362, 378, 379. 
write_node: MMIX-CONFIG 37, MMIX- 

PIPE 2^, 247, 251, 255, 256, 378, 379. 
write^restart : MMIX-PIPE 257 , 261. 

write^search: MMIX-PIPE 254 , 255 , 268, 270, 

271, 278. 

write.tail: MMIX-PIPE 247 , 249, 251, 255, 256, 

257, 360, 362, 378, 379. 

WYDE: MMIXAL 17, 62, 63, 117. 

wyde: MMIX 6. 

wydeMiff: MMIX-ARITH 28, MMIX-PIPE 21, 

344, MMIX-SIM 13, 87. 

x: MMIX-ARITH 5, 6, 13: 25, 26, 27, 29, 37, 38, 

m, 40, M, M: 46. 54. 62, 

91 . 93 . MMIX-IO 22, MMIX-PIPE 2X, M. M. 
119 . 120 . 381 . 384 . mmix-Sim 13, 61, 114, 
MMIXAL 28, 48, 76, 120 . mmotype 8. 

X field doesn’t fit... : MMIXAL 123. 

X field is undefined : MMIXAL 123. 

X field. . .register number: MMIXAL 123. 

X_BIT : MMIX-ARITH 33, 35, MMIX-PIPE M, 

307, MMIX-SIM CT, 122, MMIXAL 69. 
x.bits : MMIXAL 52. 

X_Handler : MMIXAL 69. 

XJs.dest.bit: MMIX-PIPE 101, 312, 320, 

MMIX-SIM 60, 126. 

XSs.sourceAit: MMIX-SIM 

x^ptr: MMIX-SIM W, 80, 84. 

xarjoit: MMIXAL §2, 123. 

xe: MMIX-ARITH £1, 43, M, 45, 47, 49, 

91, 92, 93, 94. 

xf: MMIX-ARITH M, 43, 44, 45, 46, 47, 

87, 91, 92, 93, 94. 

XOR : MMIX 10, MMIX-PIPE MMIX-SIM 54, 

86, MMIXAL 63. 

xor: MMIX-ARITH 29, MMIX-CONFIG 28, 

MMIX-PIPE 21, 49, 51, 138, MMIX-SIM 13, 
MMIXAL 97, 101. 

XORI : MMIX-PIPE £1, MMIX-SIM 54, 86. 

xrj)it: MMIXAL 62, 123. 

xs: MMIX-ARITH M, 43, M. 45, 46, 47, 93. 94. 

XVIADDU : MMIX-PIPE £1, MMIX-SIM M, 85. 



XVIADDUI : MMIX-PIPE 47, MMIX-SIM M, 85. 

XX : MMIX-ARITH 26, MMIX-PIPE M, 46, 100, 

102, 106, 110, 114, 117, 118, 119, 120, 
146, 227, 265, 275, 312, 320, 323, 325, 329, 
332, 335, 336, 337, 340, 341, 364, 369, 370, 
MMIX-SIM 60, 62, 74, 80, 95, 97, 101, 102, 
104, 106, 107, 108, 124. 

xyz: MMIXAL 119, 120, 123, 129, 130, 131. 

XYZ field doesn’t fit... : MMIXAL 129. 

xyzar.bit: MMIXAL 62, 129. 

xyzrjoit: MMIXAL 62, 129. 

y: MMIX-ARITH 5, 6, 7, 8, 12, 1^, 24, 25, 

27, 28, 29. 41, M. 46. 85, M. mmix- 

MEM 1, MMIX-PIPE 21, M, MMIX-SIM 1^, 
W, MMIXAL 28, 48, 120, MMMix 20, 25, 
MMOTYPE 18. 

Y field doesn’t fit... : MMIXAL 122. 

Y field is undefined : MMIXAL 122. 

Y field of lop_post... : MMOTYPE 22. 

Y field. . .register number: MMIXAL 122. 

Y.is^immed.bit: MMIX-SIM 65 . 

YSs^souree.bit: MMIX-SIM 65. 

yar.bit: MMIXAL 62, 122. 

ybyte: MMIX-SIM 28, 29, 33, 34, 35. 

ye: MMIX-ARITH £1, 43, M, 45, 47, 48, 

50, 51, 52, 93, 94, 95. 

Yellin, Frank Nathan: MMIX 40. 

yf: MMIX-ARITH £1, 43, M, 45, 46, 47, 48, 49, 

50, 51, 52, 53, 93. 94. 95. 

yhl: MMIX-ARITH 7, MMMIX 20. 

ylh: MMIX-ARITH 7, MMMIX 20 . 

ynp: MMIX-PIPE 241. 

yr.bit: MMIXAL 62, 122. 

ys: MMIX-ARITH £1, 44, 47, 48, 49, 50, 

53, 93, 94. 

yt: MMIX-ARITH £1, M, 46, 52, 93. 

yy : MMIX-ARITH 24, MMIX-PIPE M, 46, 100, 

103, 105, 118, 320, 333, 335, 337, 339, 341, 
372, 380, MMIX-SIM 60, &2, 71, 73, 97, 102, 

104, 107, 108, 111, 124. 

yz: MMIX-PIPE re, 84, 85, 109, 120, MMIX- 

SIM 60, 62, 70, 78, 101, mmixal 
120, 122, 123, 124, 125, 126, 127, 128, 
MMOTYPE 9, n, 13, 18, 19, 20, 21, 25, 30. 
YZ field at lop_end. . . : MMOTYPE 30. 

YZ field doesn’t fit... : MMIXAL 124. 

YZ field is undefined : MMIXAL 124. 

YZ field of lop_fixrx. . . : MMOTYPE 19. 

YZ field. . .register number: MMIXAL 124. 

YZ f ield. .. should be zero: MMOTYPE 25. 

YZ f ield. .. should be 1: MMOTYPE 13. 

yzar.bit: MMIXAL 62, 124. 

yzbytes: MMIX-SIM 25, 26, 29, 33, 34, 35, 36. 

yzr.bit: MMIXAL &2, 124. 

z: MMIX-ARITH 5, 8, 12, 13, 24, 25, 27, 28, 29, 

39, 40, 41, 44. 46. 86. 88. 89, 91, 

93 . MMIX-PIPE 21, 44, MMIX-SIM 13, 61, 
MMIXAL 28, 120 . MMOTYPE 18. 

Z field doesn’t fit... : MMIXAL 121. 
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Z field is undefined: MMIXAL 121. 

Z field of lop_fixo... : mmotype 19. 

Z field of lop_loc... : MMOTYPE 18. 

Z field of lop_post. . . : MMOTYPE 22. 

Z field. . .register number: MMIXAL 121. 

Z_BIT : MMIX-ARITH 44, MMIX-PIPE M, 

MMIX-SIM 5Z, MMIXAL 69. 

Z_Handler : MMIXAL 69. 

ZSsJ.mmed.bit: MMix-SiM 

ZSs.source.bit: MMIX-SIM 65 . 

zap.cache: MMix-PiPE 180 . 181 . 358, 359, 360. 

zar.bit: MMIXAL §2, 121. 

zbyte: MMIX-SIM 28, 29, 33, 34, 35, 37. 
ze: MMIX-ARITH 43, M, 45, 46, 47, 48, X, 

51, 52, 87, 91, 92, 93, 94, 95. 

zero: MMIXAL 83. 

zero. exponent: MMIX-ARITH 37, 38, 51. 

zero.octa: MMIX-ARITH 4, 24, 29, 31, 39, 41, 

44, 45, 46, 53, 73, 83, 88, 89, 93, mmix-io 4, 
8, 11, 14, 16, 18, 19, 20, 21, mmix-pipe 20, 
100, 112, 179, 237, 243, 244, 265, 271, 

279, 288, 312, 317, 330, 346, 356, 364, 380, 
MMix-SiM 13, 60, 81, 89, 99, 126, 153, 154, 
155, 158, 159, MMIXAL 27, 59, 100, 101, 
116, MMMix 12, 21, 23, 25. 
zero. out: MMIX-ARITH 93, 95. 

zero.spec: MMIX-PIPE 85, 100, 109, 112, 

113, 114. 

zeros: MMIX-ARITH 74, 76, 77, 79. 

zf: MMIX-ARITH 41, 43, 44, 45, 47, 48, 

49, 50, 51, 52, 53, 85, 87, 

92, 93, 94, 95. 

zhex: MMIX-SIM 134, 135 . 137. 

zr.bit: MMIXAL 62, 121. 

zro: MMIX-ARITH 37, 38, 39, 40, 41, 42, 

44, 46, 50, 52, 85, 86, 88, 91, 93. 
zs: MMIX-ARITH M, 44, 46, 47, 48, 49, M, 

53, 87, 91, 93. 

zset: MMIX-CONFIG 28, MMIX-PIPE 51, 345. 

ZSEV : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54, 

92, MMIXAL 63. 

ZSEVI : MMIX-PIPE 47, MMIX-SIM 92. 

ZSN : MMIX 16, MMIX-PIPE 47, MMIX-SIM M, 

92, MMIXAL 63. 

ZSNI : MMIX-PIPE £[, MMIX-SIM 92. 

ZSNN : MMIX 16, MMIX-PIPE 47, MMIX-SIM M, 

92, MMIXAL 63. 

ZSNNI : MMIX-PIPE 47, MMIX-SIM M, 92. 

ZSNP : MMIX 16, MMIX-PIPE 47, MMIX-SIM M, 

92, MMIXAL 63. 

ZSNPI : MMIX-PIPE 47, MMIX-SIM 54, 92. 

ZSNZ : MMIX 16, MMIX-PIPE 47, MMIX-SIM M, 

92, MMIXAL 63. 

ZSNZI : MMIX-PIPE 47, MMIX-SIM 54, 92. 

ZSOD : MMIX 16, MMIX-PIPE 47, MMIX-SIM 54, 

92, MMIXAL 63. 

ZSODI : MMIX-PIPE 47, MMIX-SIM M, 92. 

ZSP : MMIX 16, MMIX-PIPE 47, MMIX-SIM M, 

92, MMIXAL 63. 



ZSPI : MMIX-PIPE MMIX-SIM 92. 

ZSZ : MMIX 16, MMIX-PIPE TJ_, MMIX-SIM M, 

92, MMIXAL 63. 

ZSZI : MMIX-PIPE MMIX-SIM 92. 

Zt: MMIX-ARITH M, M, 52, 

88> 91, 93- 

MMIX-ARITH 24, MMIX-PIPE M, 46, 100, 
103, 104, 118, 146, 320, 322, 323, 328, 
337, 338, 339, 341, 355, 356, 372, 373, 
MMIX-SIM 60, 62, 71, 72, 97, 102, 107, 
108, 109, 124, 133, 138. 

16ADDU : MMIX 9, MMIXAL 63. 

2ADDU : MMIX 9, MMIXAL 63. 

4ADDU : MMIX 9, MMIXAL 63. 

8ADDU : MMIX 9, MMIXAL 63. 



